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Part I - The Structure of Neural Networks 


The neural network structure is the initial axiomatic base for the neural network theory 
itself, as well as for the neural network solution algorithms, architecture of neural chips, 
and neural computer architecture. 

We represent below the main reasons determining the transfer from the logical basis 
of Boolean elements to the threshold logical basis. The main threshold elements are 
described as well as their relationship with multiple-meaning and continuous logic. 
The main neural network types proposed by different authors in the 1960s (by 
Rosenblatt, Widrow, etc.) and at present are described. Materials related to continual 
neural networks (continuum of features, neurons, etc.) are also presented. We will also 
separately consider some objective reasons for introducing cross connections in mul- 
tilayer neural networks, as well as a formal description of methods of such neural 
networks. 


Introduction 


1.1 
Neural Computers 


Neurocomputers are computers of a new class. Their appearance was determined for 
two objective reasons: first, the principal stages of the development of modern elemental 
base technology that mainly determines the development of computer architecture, 
and second, practical requirements to solve specific problems in a faster and more 
economical manner. 

As far back as the 1950s, the main reason for neural computer development appeared 
to be a development of the threshold logic that was permanently contradistinguished 
to the classical development of the elemental base on the basis of AND, OR, NOT, etc. 
This resulted in the implementation of a series of specific problem-oriented and ex- 
perimental universal neural computers in the 1960s and 1970s. The terms “neural com- 
puter” and “neurocomputer” are not associated with any feature or characteristic of 
the human or animal nervous system. They are associated only with the conditional 
name of a threshold element with the adjustable or fixed weights that implement an 
elementary transfer function of the neural cell. A sharp upswing of LSI technological 
development at the beginning of the 1970s, as well as the implementation primarily of 
microprocessor chips with the classic computer architecture was realized on the el- 
emental base on the basis of AND, OR, NOT, etc., resulting in the slowdown rather than 
complete termination of the development of the computing facilities based on the 
threshold logic elements. The next upswing at the beginning of the 1980s allowed one 
to redefine the problem of neural computers due to the fact that VLSI technology, rather 
than LSI technology, allowed one to implement in one or several chips not only a great 
number of processing neuron elements, but also a whole set of connections between 
them. This was not possible before. Such a possibility was provided by both electronic 
as well as optical implementation methods in the middle of the 1980s. 

The main idea of the neural computer construction, either a problem-oriented or 
universal one, is to develop computers in the analog-digital form. In this regard, the 
“fast” analog part performs multidimensional operations on the threshold basis. The 
algorithms of the neural network coefficient adjustment are implemented either in the 
“fast” manner in the analog form, or in the “low-speed” manner in the form of special- 
ized digital circuits emulating neural algorithms, or in the “low-speed” manner in the 
digital form, for example, using the universal personal computer. 

The development of neural computers requires a design of the principally new 
algorithms for the multidimensional solution of problems. The time for the solution 
of the specific problem, on the one hand, only linearly depends upon the problem 
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dimensionality and, on the other hand, is determined by the convergence time of the 
iteration process for the solution in the particular neural network. 

The main formal basis for the design of the neural algorithms is a neural network 
theory [I-1 to I-7]. Let us call the main set of operations that are realized in the process 
of the algorithm refinement as a “logical basis of the problem”. For the majority of 
problems such a basis is the basis {Zax}. The following problems require this basis: the 
problems of vector algebra, Fourier transformation and optimization problems. 

A class of tasks including solutions of the ordinary differential equations, Poisson, 
Euler, Navier-Stokes equations, elliptical equations and so on can be reduced to the 
aforementioned problems. 

A logical basis of the computer system is the main group of operations implemented 
by the elements of the basic operating device. In the case of classical computers, it is a 
basis of AND, OR, NOT that forms, first, the level of more complex basis (Sheffer stroke, 
multiple AND, OR, NOT, etc.), and a macro level, i.e., a level of microdevices. The logical 
basis of the computer system is not determined by the logical basis of the solved problems 
but requires an additional application of a rather complex program development system. 

In the case of neural computers, the logical basis of the computer system in the 
simplest case is the basis {Lax, sign}. This basis maximally corresponds to the logical 
basis of the major solved problems. 

When solving problems with the neural computers’ logical basis, the basis of the 
problem is in accordance with the basis of the computer system, and there are no ar- 
tificial shifts in any direction: 


= one such shift corresponds to the problems with the threshold basis, whereas the 
basis of the computer system is AND, OR, NOT; 

* another shift corresponds to the problem with the basis that differs from the thresh- 
old one, whereas the computer system is neural. 


It is supposed that the accordance between the computer system basis and the basis 
of the problem provides the highest productivity. This statement is trivial in the case 
of problem-oriented computers that are designed for the solution of a given specific 
task. However, it is not trivial for the neural computers that pretend to be called uni- 
versal at the present time. 

All the difficulties of multiextremal search by iteration methods using neural com- 
puters are evidently preserved. But these difficulties transfer from the software imple- 
mentation (von-Neumann computers, computers with SIMD and MIMD architectures) 
to the software/hardware implementation. In short, an algorithmic kernel of the main 
array of applied problems can be realized in hardware or software/hardware form with 
maximal operation speed. The neural computer is a maximally parallelized system for 
a given algorithmic kernel implementation. The number of operation cycles in the 
problem solving process, i.e., the number of the adjustment cycles for optimization of 
the secondary functional in the neural computer, is not determined by the subjective 
intuition of the circuit designer who distributes the processing among the circuit lay- 
ers consisting of the Boolean elements. This number is also not determined by the 
subjective views of the programmer who organizes the interaction between layers. On 
the contrary, it is determined by the physical entity and complexity of the problem. 


I.1 - Neural Computers 


Homogeneous neural networks possess properties of gradual degradation when 
some of its elements break down. This phenomenon was noticed by Rosenblatt while 
designing the three-layer perceptron with the arbitrary connections in the first layer. 
An excessive number of elements in the first layer were taken. In this case, a function 
implemented by the neural network is quasi-distributed along the structure. Neural 
computers are the first example of an analytically calculated computer structure rather 
than an empirically designed structure based on some subjective views about the prob- 
lem and the elemental base. 

The neural computer implementation methods are mainly divided into three classes: 


1. Software emulation of neural algorithms with the help of computers with SISD archi- 
tecture (for example, classical personal computers), SIMD architecture (for example, 
Connection Machines) or MIMD architecture (for example, transputer networks); 

2. Software/hardware emulation of neural units on the digital elemental base that 
provides an accelerated performance for the array of operations in the threshold 
basis and, first of all, such operations as multiplying and addition (Weitek proces- 
sors, signal processors of the TMS32020 type, etc.); 

3. Hardware implementation of the neural unit on the elemental base that is charac- 
teristic of the neural algorithms (CMOS-neurochip, optically controlled transmis- 
sion-type indicators, holographics, etc.); 


The efficiency of the neural algorithm implementation for specific problems in- 
creases from variant 1 to variant 2 and further to variant 3. 

In the case of hardware implementation of the neural unit or software/hardware 
emulation, the neural computer represents a classical structure of the problem-ori- 
ented computer with SIMD architecture. In more complex cases, it represents a mixed 
MSIMD architecture. Then the computer is a network of asynchronously operating 
computer units, each one connected with a unit of synchronously operating elements 
that implement a part of the neural algorithm. 

Supercomputers, such as CRAY XMP, and CYBER 205 had enormous computer ca- 
pacity. However, they were very expensive and their architecture was not in accordance 
with principles of the neural processing. The high processing capacity was achieved 
due to the array processors, special pipelined processors, and reduced cycle time. As a 
rule, such supercomputers were constructed on the basis of modern technology, and 
they reached a point of their development when further increase of operation speed 
was restricted by physical limitations for signal propagation in computer circuits. The 
outcome of this “technological” dead-end was achieved by the use of fine-grained pro- 
cessing that was implemented by computers with SIMD and MIMD architectures of Con- 
nection Machine type, Intel Hypercube type, Ncube type, Meiko Computing Surface type 
and so on. In particular, this outcome was achieved by the use of “neural” processing. 

Different approaches for the revelation of parallelism and its implementation on 
the parallel processors exist: 


* Parallelism of sub-problems or events; 
" Explicit algorithmic parallelism; 
= Geometric parallelism. 
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The first type of parallelism is efficient for the program that is executed many times 
with different parameters. It is desirable in this case to represent this program with a 
set of independent sub-problems (each neuron with its own parameters) and to per- 
form the sub-problems in parallel on different processors. 

The array processor for floating-point multiplication is an example of algorithmic 
parallelism implementation. Organization of parallel processing in this case consists 
in the appointment of different sub-problems for the pipeline of processors, each of 
which performs some operation over the data and transfers the result further. In order 
to provide the efficient performance of the pipeline, one must balance the processors’ 
loading and access and take into account the time required for data transfer from one 
processing stage to another. 

Geometric parallelism consists in data decomposition among processors in such a 
way that all the required data are located in the random access memory of one processor 
or also closely located processors. This geometric parallelism is a standard approach 
that is used for the computers with SIMD architecture. The practical experience shows 
that a lot of calculations required for the neural network modeling are local. This fact 
makes real the neural computer implementation related to the geometric parallelism. That 
is why neural computers with hardware or software/hardware implementation of the neural 
unit are related to the classes of SIMD or MSIMD architecture, as mentioned above. 

A shining example of mathematical operation, in which parallel implementation cor- 
responds to the processes taking place in the neural network, is the operation of multipli- 
cation of matrix by vector. In this case, the vector represents an input signal for a one-layer 
neural network, and the matrix represents coefficients of the layer neurons. The output 
neural network signals represent a result of multiplication and nonlinear transformation. 

Neurocomputers are the object of interdisciplinary research. Consequently, defini- 
tions of the neurocomputer can be made only against the background of some more 
definitions that are adequate to the different branches of science. 


Mathematical statistics. Neurocomputers are systems that allow one to form descrip- 
tions of stochastic processes and their assemblies that possess complex and often 
multimodal or a priori unknown distribution functions. 


Mathematical logic and automata theory. Neurocomputers are systems in which the 
algorithm of the solution is represented by the logic network of a particular form, namely 
by neurons with the complete elimination of Boolean elements of the AND, OR, NOT 
types. As a consequence, specific connections between elements are introduced. 


Threshold logic (1950s and 1960s, computers on the base of threshold logic). Neuro- 
computers are systems in which the algorithm of the solution is represented in the form 
of the network with threshold elements with dynamically tunable coefficients and adjust- 
ment algorithms that are independent of the input space and network dimensionalities. 

Practically all the approaches based on the threshold logic have limitations similar to 
the Boolean elements. They depend on the network dimensionality and on the input space 
of elements. It takes place in spite of the fact that these approaches possess external evi- 
dence of the neural networks. 


1.2 - Position of Neural Computers in the Set of Large-Powered Computing Facilities 


Control theory. The complexities of nonlinear dynamic control system synthesis 
are well known. In the case of neural computers, these complexities are partially 
overcome when a special case of a control object is taken. This object is well formal- 
ized and represents a multilayer neural network. A dynamic process of its adjustment 
represents a solution. Practically all synthesis methods for adaptive systems of con- 
trol are transferred to the neural networks in the form of this special case of the control 
object. 


Computational mathematics. Neural computers implement solution algorithms in the 
form of neural networks. This provides the development of algorithms that are poten- 
tially much more parallel than any physical implementation. A set of neural network 
solution algorithms represents a new perspective field of computational mathematics 
conditionally called neural mathematics. 


Computer engineering. From the viewpoint of computer engineering, the neural com- 
puter is a computer system with MSIMD architecture with the following three princi- 
pal technology solutions: 


" The processor element of the uniform structure is simplified up to the level of a 
neuron; 

= Connections between elements are very complex; 

= Programming of computational structure is transferred to the adjustment of con- 
nection weighting coefficients between processor elements. 


A general definition of the neurocomputer. The neurocomputer is a computer sys- 
tem with hardware and software architecture that is adequate to the algorithm execu- 
tion presented in the neural network logical basis. 


1.2 
Position of Neural Computers in the Set of Large-Powered 
Computing Facilities 


One can conditionally divide the large-powered computing facilities into two classes: 
“large-grained” (consisting of a small number of processors) and “fine-grained” (con- 
sisting of hundreds and thousands of processors) ones. The neural computers relate to 
the fine-grained class. 

The following class of computers is possible to consider as a new class of computing 
facilities. These are computers that solve a rather wide range of universal problems and 
require, as compared with the computers of traditional types, a new hardware imple- 
mentation, new program development systems and solution algorithms. 

According to the given definition of the large-grained processing computer of CRAY 
type, the computer class “Elbrus” represents a new class of computers as compared 
with von Neumann computers. And the fine-grained processing computers with SIMD 
(Single Instruction - Multiple Data) architecture represent a new class of computers as 
compared with the large-grained processing computers. 
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Each of the following computer classes represents a new class with respect to the 
previous one: 


= Computers with von Neumann architecture; 

= Large-grained processing computers of CRAY type such “Elbrus” computers; 

= Fine-grained processing computers with SIMD architecture; 

" Fine-grained processing computers with MIMD (Multiple Instruction - Multiple 
Data) architecture; 

" Fine-grained processing computers with the mixed architecture of MSIMD type 
(Multiple variant of SIMD); 

= Neural computers. 


Computers with the SIMD architecture were the first fine-grained computers out of 
four such computers shown in Fig. I.1 in their evolutionary development. They were first 
designed in the middle of the 1970s. The processor in this case is a single-bit processor 
with some local memory. This resulted in the necessity to organize a synchronized 
performance of a set of such units in the process of sufficiently complex solutions. 

The development of VLSI technology at the beginning of the 1980s gave rise to the 
construction of the first production prototypes of the fine-grained computers with 
MIMD (Multiple Instruction - Multiple Data) architecture. These devices were imple- 
mented in the form of asynchronous networks in the 16- and 32-bit microcomputer, 
including transputer networks. 

Several designs of computers with mixed architecture were developed from 1985- 
1988. They consisted of the kernel in the form of the network of asynchronously func- 
tioning processors. Each processor of such a kernel controls a synchronously function- 
ing network of the usual processors with local memory. Such an architecture is some- 
times called a multiple variant of SIMD (MSIMD). 

The MSIMD architecture is a repetition of the attempt for the algorithm parallelizing 
on the basis of the network of synchronously functioning processors that is performed 
on the new qualitative level (MIMD). Such an attempt has been already implemented 
on the basis of SIMD structures. 

In general, neural computers can be regarded as a particular case (or further devel- 
opment) of the computers with MSIMD architecture in which a synchronized unit at 
each of the asynchronously functioning “fine grain” computers represents not a simple 
“vulgar” network of single-bit processors with memory, but a meaningful synchro- 
nized unit performing a hardware/software (better pure hardware) emulation of the 
neural algorithm. In the simplest case, such an algorithm represents an operation of 
multiplying a large dimensionality vector or matrix by a vector. But this occurs only in 
the simplest case. The neural computers represent a special case of the MSIMD struc- 
ture in which a synchronously functioning “cluster” of single-bit processors has a spe- 
cial organization that is close to the hardware/software implementation of the main 


SIMD | > MIMD [-—_> MSIMD }|}——> Neural computers 


Fig. I.1. Evolution of fine-grained computer development: SIMD - Single Instruction - Multiple Data; 
MIMD - Multiple Instruction - Multiple Data; MSIMD - Multiple variant of SIMD 
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part of the algorithm. In the considered neural case, such a “cluster” represents a hard- 
ware/software implementation of the “neural” kernel of a number of algorithms. 

The hardware/software implementation of the neural algorithms for the synchro- 
nized units of the neural computers most probably also provides the solution of two 
additional problems: 


1. To minimize (or sometimes to eliminate completely) the information interchange 
between the nodes of the neural computer asynchronous kernel in the process of 
problem solving. Such a possibility is practically excluded for the majority of prob- 
lems in the transputer networks or similar systems. 

2. To solve so-called weakly formalized problems, such as learning for the optimal 
pattern recognition, self-learning (clusterization), etc. 


It must be mentioned that at present, a supercomputing level is the most important 
application of neurocomputers out of any other possible applications. This level is 
characterized by the lack of a computing facility capacity under the existent limita- 
tions. In this case, the objective necessity for the neural computer architectural devel- 
opment in the form of the natural development of the MIMD architecture can be also 
explained by the fact that the natural desire of a design engineer to increase the capac- 
ity of the node or to increase the number of nodes in the MIMD architecture leads to 
the “dead state region” (Fig. 1.2). According to the price-capacity ratio criterion, the 
“dead state region” appears for the following two reasons: 


" Objective existence of inter-node exchanges and increase of common losses required 
for these exchanges when the number of nodes increases; 
" Increase of the node cost when its capacity increases. 


A dead-state character of the desire to increase the capacity of a computer with the 
MIMD architecture by means of increasing the capacity of the node or by increasing 
the number of nodes is determined by the fact that the capacity increases more slowly 
(or significantly more slowly) than the system cost. The reason for this observation is 
the higher rate of the node cost increase as compared to its capacity increase as well 
as the fact that under the increasing number of nodes, the losses required for the inter- 
node information exchanges in the process of problem solving also increase. 


Fig. 1.2. 

Illustration of limitation of the 
MIMD architectural develop- 
ment when the node capacity 
or the number of nodes in- 
creases 


Number of nodes 


"dead state region" in the MIMD 
architecture development 


Capacity 
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We consider that namely this is the main reason for the use of some synchronously 
functioning neural unit in each node of the MIMD architecture. Such a unit imple- 
ments the neural network in the hardware/software (purely hardware in future) form. 
In turn, this neural network implements some given function. 

Economic factors play and will play a significant role in the process of fine-grained 
super-computer development, especially in the development of computers with a great 
number of nodes. The cost increase of the transputer-like custom-designed LSI related 
to its complication and increase of capacity will determine its use not as a base element 
of a large array but as a base element of a “middle-size” commutation array. Hence, 
each node of the latter array is connected with some unit consisting of more uniform, 
i.e., less expensive, LSIs. These LSIs constitute some co-processor that is problem-ori- 
ented either by its structure (a network of single-bit processors with memory) or by its 
function (for example, a neural calculating machine). 

We consider therefore that the role of custom-designed LSI-transputers and trans- 
puter-like elements with a structure complication and an enlarged number of transis- 
tors will decrease in the domain of computer facility development that corresponds to 
super-computers with fine-grained structure. The main tendency will consist in the 
massive usage of the aforementioned elements in the class of personal computers 
(computer cards, accelerators) and super-personal computers (blocks of several cards 
for personal computers). They will be also used to a lesser degree in the class of su- 
per-mini computers (a column consisting of several dozen cards and several personal 
computers of Meiko and Megaframe types). The appearance of systems containing 
several thousands of transputers and transputer-like elements is an objective but tem- 
porary phenomenon in the class of super-computers. Moreover, this phenomenon takes 
place only in the domain of super-computer applications that require unique super- 
computer samples. 

This suggests that in perspective, the development of supercomputers with the fine- 
grained transputer-type structure containing more than 10000 nodes is only an utter- 
most trend. And evolution of this trend can be found in the architecture of a peripheral 
parts of each node in the commutation array of transputer elements of the future sys- 
tem. Taking into account all the aforementioned remarks, one can conclude that the 
neural computers represent an effective line for the development of super-computer 
architectures. 


3. 
The Concept of Computer Universalism 


Each computer is in some sense problem-oriented to the extent that it solves different 
problems with different efficiency. However, such specialization becomes less and less 
expressed in the course of development of each new class of computers, at least due to 
the enlargement of the application field. One can imagine a qualitative pattern reflect- 
ing a degree of universalism of different computer classes at each current time. A 
possible example is represented in Fig. 1.3. At present, neural computers are “more 
problem-oriented” than transputer-like computers. In turn, transputer-like computers 
are more problem-oriented than single-processor ones. However, it is a question of 
time or resources dedicated by perforce to either line of development. 


1.4 - Neural Computer Modularity 
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Fig. |.3. Qualitative pattern for the estimation of universalism level for computers of different classes 
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Figure I.4 shows a qualitative illustration of potentialities provided by the 
parallelizing of algorithms, presented by computing facilities of different classes ap- 
plied to the solution of some specified problem. The number of operations executed 
per computer cycle increases due to the increase of potential ability provided by the 
parallelizing process. As a result, the time required for the solution decreases. 


1.4 
Neural Computer Modularity 


Neural computer modularity (modular extendibility) is determined by the objective 
requirement for the existence of a transputer or transputer-like kernel in its structure. 

Concrete examples of neural computer designs show that the intention to increase 
their capacity taking into account their real overall-structure characteristics requires 
the existence of a transputer or transputer-like kernel in their structure. Such a kernel 
is necessary for the organization of an asynchronous process of information transmis- 
sion between separate neural units both during preparation for the solution as well as 
during the solution process. The structure of such a transputer or transputer-like ker- 
nel can be different. Sometimes, when the number of asynchronously functioning nodes 
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is sufficiently large (several hundreds or thousands), it is necessary to transfer from the 
kernel structure of “lattice” or “torus” type to the structure of “hypercube” type in order 
to increase the equivalent traffic of information transmission in the kernel. 

As distinct from computers with MIMD architecture (Fig. I.2), the capacity of neu- 
ral computers is uncritical to the capacity of the node computers at the asynchronously 
functioning kernel. Therefore, there is no need to prove the necessity of capacity in- 
crease by means of the obligatory exaggeration of technology-based standards. 

It is necessary to notice some properties of the asynchronous kernel of the neural 
computer. The hardware and software compatibility of neural computers with 
transputers or with other transputer-like elements is preserved in the kernel construc- 
tion based on transputers. Neural computers (Fig. I.1) do not represent something exotic, 
as it was the case in the 1960s. They are a regular result of the evolutionary develop- 
ment of the architecture of fine-grained computers. 

Similarly to the transputer systems, neural computers can be used at different levels 
of implementation: 


= Personal computer built-in boards; 

= Personal computer units consisting of several boards; 

= Columns with control personal computers; 

= Assemblage of columns with control personal computers (super-computer level). 


Similar to the case of transputer systems, the main goal of neural computer design 
is the achievement of super-computer or supercomputing level. That is why one must 
regard the construction of different neural boards for personal computers at present 
as a pure technological and instrumental stage or a stage of implementation of some 
insignificant problems. The main strategic problem remains to be the problem of neural 
super-computer development. As an example, one can consider the development of 
neural computers at TRW Company (MARK, Input, II, III, IV, V, etc.). 

The asynchronous kernel in neural computers will mainly perform two functions: 


" Preparation, transmission of initial data and receipt of calculation results from 
separate neural units at each node of the asynchronous kernel before the beginning 
of the solution; 

" Messaging during the solution process when the problem is insufficiently prepared 
for the solution by neural computer. It is necessary to minimize the level of such 
insufficiency and to minimize the messaging traffic in the asynchronous kernel. 


1.5 
The Class of Problems Adequate to Neural Computers 


All the problems solved by computer systems can be conditionally divided into three 
classes: 


1. Formalized; 
2. Weakly formalized; 
3. Non-formalized. 


I.5 - The Class of Problems Adequate to Neural Computers 
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The class of formalized problems consists of problems with an explicit and trans- 
parent solution algorithm that directly indicates the corresponding class of computers 
and computer architecture (SISD, SIMD, MIMD, etc.) providing the best solution. 

The class of weakly formalized problems consists of problems either with a non- 
unique solution algorithm or with a solution algorithm that does not provide a simple 
estimation of the solution quality or solution accessibility. Usually the problems with 
large dimensionality belong to this class. These problems are characterized by so-called 
“dimensionality damnation” that leads to the necessity to use iteration procedures for 
their solution with very difficult estimation of iteration process convergence and pre- 
cision. The iteration procedures adequate to the weakly formalized problems are some- 
times used even for the formalized solutions in the case when the formal algorithm is 
very laborious (for example, for the solution of the linear equation system with suffi- 
ciently large dimensionality). 

The class of non-formalized problems consists of problems with solution algorithms 
containing parameters or functions that are implicitly given in the form of description 
of some input signal class. The examples of such problems are pattern recognition 
problems, clusterization or self-learning, search for informative attributes, etc. 

Notice that in principle, there is a relationship between the aforementioned prob- 
lem classes and the type of computer architecture. But this relationship cannot be directly 
expressed. The formalized problems characterized by an essentially consequent algo- 
rithm are evidently adequate to the SISD architecture. However, a lot of problems among 
well-formalized ones can be solved much more efficiently by the use of special 
parallelization methods. This class of problems represents a sufficiently large domain 
of the present-day mathematics related to the development of parallelization algorithms. 
As a rule, the development of a parallelization algorithm for the formalized problem 
is performed on the basis of the concrete computer architecture (SIMD, MIMD, etc.). 
The aforementioned weakly formalized problem can evidently be solved using a serial 
computer, but these problems are more adequate to computers with SIMD and MIMD 
architectures. Moreover, weakly formalized problems to some degree were the reason 
for the development of computers with such architectures. 

It is possible to consider computers with SIMD and MIMD architectures as devices 
conditionally adequate to the formalized problems with a sufficiently parallelized so- 
lution algorithm and to the weakly formalized problems. 

Namely non-formalized problems were the reason for the development of neural com- 
puters about thirty years ago, though in principle, the solutions of neural problems were 
performed on the basis of computers with SISD architecture, and there are attempts to 
solve such problems on the basis of computers with SIMD and MIMD architectures. 

The following main problems must be solved in the process of neural computer 
development: 


= Development of the solution algorithm adequate to the neural computer structure. 
Selection of a kernel in the solution algorithm structure that is adequate by its struc- 
ture to the neural network with maximal parallelism (number of neurons in the 
layers, dimensionality of the attribute space); 

= Development of structures and methods for the neural network implementation 
adequate to the given class of problems; 
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= Development of neural network adjustment algorithms in the process of solution of 
given problems and analysis of their convergence; 

= Development of the neural network theory sideward universalization of the neural 
kernel in the enlarged class of algorithms. 


The uniform neural network structure for the selection of the solution algorithm in 
the capacity of the architecture kernel was chosen according to the following reasons: 


1. Such a structure provides the possibility of massive parallel synchronous execution 
of a large number of operations that in turn consist of the simplest operations of 
addition, multiplication and nonlinear transformations. 

2. Such a structure implements sufficiently complex and flexible functional transfor- 
mation of the input space into the output space. 

3. Such a structure enables an analytical description of transformation of the input 
space of states into the output one. 

4. Such a structure enables the organization of the controlled process of the network 
coefficient adjustment in the adaptive mode. 

5. In the future, the use of the linear sequential Gill machines will allow one to 
come to the solution of the analytical description problem. This in turn will pro- 
vide the possibility of synthesizing adaptation algorithms in the multilayer neural 
networks [I-8]. 


1.6 
Methods of Coefficient Readjustment 


Methods of readjustment of the neural network weight coefficients in neural comput- 
ers can be classified in the following way: 


* Technological methods (used in the stage of production) similar to those used, for 
example, in the production of GaAs optical neural chips; 

= Schematic design methods (used for the specified user before the exploitation stage); 

= System engineering methods (used in the process of functioning) that in turn can 
be conditionally classified into several subclasses, for example, low-speed methods 
(as during the solution of linear inequalities) and high-speed methods (as during 
adaptive processing of the neural network input signal). 


1.7 
Neural Computer Classification 


According to the common opinion of designers, neural networks have a much wider 
field of implementation than any other implementations of parallelism concepts due 
to the fact that the property of a large massive parallelism in the case of neural net- 
works is embedded inside of them. 

Figure I.5 shows a structure of the main neural computer types. This structure is 
presented in order to determine the main perspectives of the architectural develop- 
ment for large-powered computers, in particular, for neural computers. 
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Fig. I.5. Structure of main types of neural computers 
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Computers of this type are represented by the well-known EC computers, CM 
computers and personal computers. 

Neural computers of the simplest form in which a neural algorithm is soft- 
ware emulated on the basis of type-1-computers providing an equivalent ca- 
pacity increase in the process of problem solving. 

This type of computer includes single-processor computers (large-scale com- 
puters, mini computers or personal computers) equipped with array processors. 
Computers of DAP, IBM with FPS, STARAN, etc. -types can serve as examples. 
Neural computers with hardware/software emulation of neural algorithms on 
the basis of type-3-computers. 

Computers of different classes, beginning with super-computers up to mul- 
tiple-microprocessor computers, equipped with several numbers of proces- 
sors (usually not more than 2, 4, 8, or 16). Computers “Elbrus”, WARP, Alliant, 
etc. are the examples. 

Neural computers with software emulation of neural algorithms on the basis 
of type-5-computers. 

Neural computers with SIMD architecture equipped with several (not more 
than 2, 4 or 8) command processors. 

Neural computers of type 7 with the hardware/software implementation neu- 
ral algorithm of the solution. 

Computers of transputer type with a large number of processors (dozens, 
hundreds and thousands). 

Implementation of neural solution algorithms on the basis of type-9-com- 
puters. 
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Type 11. Computers with MSIMD architecture equipped with a sufficiently powerful trans- 
puter or transputer-like kernel. The increased capacity is achieved by the addi- 
tion of a co-processor to each kernel processor. Thus, each such co-processor 
represents some unit with SIMD architecture. 

Type 12. This computer type is similar to the type-11-computers equipped with synchro- 
nously functioning clusters of processor elements implementing neural algorithms. 


It is necessary to mention that computers of 4-, 8- and 12-types include a hardware/ 
software implementation of “neural clusters”. Such an implementation is significantly 
oriented namely on the solution based on the neural algorithms with corresponding 
elemental base, architecture, and program development systems. 

From our viewpoint, it is also noteworthy that at present, the computer systems of 
types 9, 11, and 12 are the most perspective. 

In type-11-computers, the disadvantages related to the property 1 are partially elimi- 
nated. And in type-12-computers, also the disadvantages related to the property 2 are 
eliminated. 


1.8 
Some Remarks Concerning the Neural Computer Elemental Base 


The problem of the neural computer elemental base is the most significant one in the 
determination of the neural computer type that will be designed in the nearest future. 
Classical technology lines of elemental base elaboration oriented on the neural struc- 
tures must be developed along with new technology lines appropriate only to the neu- 
ral computers. Suggestions concerning the development of the perspective neural com- 
puter elemental base are formed on the basis of a wide scope of functions. A detailed 
analysis of the efficiency of problem solving with the use of various technologies will 
allow one to select in the future the preferences and top-priorities in the neural com- 
puter elemental base implementation. 


One can consider the following neural computer elemental base as a top-priority one: 


* Custom-designed transputer-like 32-bit microprocessor that will allow one to save 
in the future the already accumulated experience of technological design of trans- 
puter and transputer-like systems and to use the already developed reserve of the 
corresponding software; 

= Cascade signal processors of IMS A100- and IMS A110-types; 

= VLSIC packages and memory microassemblies with high operation speed and digi- 
tal capacity of samples; 

= Programmable logic IC for the neural processing element implementation. 


The following neural computer elemental base can be considered as a second-priority one: 
= Custom-designed digital CMOS neural chips; 


= Optoelectronic GaAs neural chip; 
= Analogous CMOS neural chips. 
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The following neural computer elemental base can be considered as a third-priority one: 


= System of neural units on the slab board; 
= Neural blocks based on molecular electronics; 
= Quantum neural computers. 


A majority of new designs of perspective information processing facilities in Russia 
and abroad are associated with an obligatory decrease of the requirements of technol- 
ogy standards and an increase of the chip integration level. The concept of neural 
computers on the one hand provides the increase of the capacity-price ratio during 
problem solving based on the current technology standards, and on the other hand 
allows one to use principally new technologies (analogous, optical, charge coupled 
device, etc.) for the development of high-efficiency systems. We consider that this will 
allow one to escape difficulties related to the intention to minimize technology stan- 
dards in the process of digital VLSIC manufacturing. 

The importance of microelectronic technologies in computer architectural devel- 
opment is evident. Moreover, we consider that namely technological development gives 
rise to new types of computer architectures. This was the case in the middle of the 
1970s when the appearance of medium-scale IC gave rise to the development of com- 
puters with SIMD architecture. Similarly, at the beginning of the 1980s, the develop- 
ment of large-scale IC gave rise to the development of transputers and computers with 
the MIMD architecture. 

Namely the development of microelectronic technologies actualized the active de- 
velopment of neural computers in the second half of the 1980s. A transputer could 
objectively be designed only after the 32-bit micro-processor, on-chip memory, and 
channel adapters could be manufactured in a single chip. Similarly, the active develop- 
ment of neural computers started after the hardware implementation of a cascaded 
segment of the neural network with adjusted or fixed coefficients became possible in 
a single chip. 

It is important to notice that in the case of the neural technology representation of 
the solution algorithms, one can avoid an abnormal (from our viewpoint) intention 
towards the “distilment” directed at the submicron technology. Such a trend was char- 
acteristic for the computer systems based on the processors of i860, Power PC, Alpha, 
Mersed, etc. types. The aforementioned intention towards the “distilment” of the sub- 
micron technology usually results in the following: 


= Short-term, local, and often illusive results in the development of domestic compu- 
tation facilities based on the use of imported micro-processors; 

" Practically zero contribution to the development of domestic microelectronics that 
constitutes the basis of the future domestic computer science. 


Neural computer designs will be efficient by their capacity-price ratio even with the 
use of available Russian 1.0-micron technology. Such designs will provide a higher- 
priority development of domestic giant-powered computers. 

Design engineers developing a line of highly parallel computers with an increased 
node capacity must remember that the speed of the information propagation in 
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the human brain is very low. Therefore, one must think about the development of 
perspective highly parallel solution algorithms and architectures, including neural 
ones, rather than about the operation frequency of the node element. It is also neces- 
sary to remember that mankind achieved a certain limit that makes it possible 
to create an engineering system consisting of 3-4 billions of neurons (as in the hu- 
man brain). However, nobody knows how to organize the system of connections be- 
tween them. 

Neural chip development is one of the main lines in neural computer design. The 
neural chip structure corresponds to the results of structure and adjustment algorithm 
development of multilayer neural networks (in the case of commonly used neural 
computers) and neural network solution algorithms (in the case of problem-oriented 
and special-purpose neural computers). 

However, the development of this technology line will require some period of time 
during which, for some objects, it will be cost-effective to emulate the neural network 
solution algorithm on the basis of large-powered computers. 

It is necessary to mention a low efficiency of workstation-type single-processor 
computer applications for the solution of complex problems in the neural network 
logical basis. 

It must be mentioned in addition that in order to emulate neural network algo- 
rithms with the use of universal microprocessor facilities, it is more effective to de- 
velop architectures oriented on the neural network operation execution than to use 
standard algorithms oriented on the modification of a single-processor solution. 

We consider the following classes of computer facilities: 


= Single-processor computers (personal computers, middle-class computers, etc.); 

= Small-processor computers; 

= Multi-processor computers (computers with massive parallelization, transputer 
computers, psuedo-transputer computers, computers with transputer kernel and 
peripheral processors of i860-, Alpha-, Power PC-, etc. types); 

= Neural computers. 


The priority of Russian computer science belongs to the development of neural 
computers. 

On the modern stage of the development of microelectronic technology and ad- 
joining technologies, neural network technology became adequate not only for the 
different types of microelectronic and semiconductor technologies but also for optical, 
optoelectronic, molecular, quantum and other technologies. 

It must be mentioned that the appearance of slab board system technology and 
nanotechnology will result in the development of some new super-parallel architec- 
tures. It is clear at present that the neural network architecture technology is adequate 
to the slab board technology (American and Japanese engineering designs). That is 
why the attempts to develop functional blocks with old architecture on the level of 
nano-elements adequate to the single-processor computers can be regarded as dead- 
end. Starting from nano-neural elements, one can probably achieve principally new 
architecture elements. It is clear that it will be the elements of super-parallel and large- 
powered computers. 


1.9 - Neural Mathematics - Methods and Algorithms of Problem Solving Using Neurocomputers 
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We consider that the investigation of real structure of biological neural networks 
aimed at the revelation of their structure peculiarities for its future use in prospective 
computers is practically useless without software automation technologies for the 
processing of cytological images. Such technologies must include specially designed 
software that not only provides fast and high-grade quality input of real images of neural 
tissue cuts but also allows one to perform the in-line processing of these images. 

At present, the neural computer designers are not limited by the properties of real 
specific neural structures because of their relative simplicity or simplicity of considered 
problems. The increase in requirements imposed by the used neural structures will cause 
the development of a neuro-physiology line for their investigation. First of all, this will 
take place in the domain of different vision and acoustic sensor information processing. 

Each new technology gives birth to a new class of architectures of computing facili- 
ties. This was the case with SIMD architecture at the end of the 1970s and at the begin- 
ning of the 1980s. The same situation was with MIMD architecture at the beginning 
and at the end of the 1980s. Nowadays it occurs with neurocomputers. 

The performance of investigations of real neural network structures aimed at the 
development of neurocomputers and molecular computers will result in the appear- 
ance of original architecture prototypes, because the technological principles of real- 
izing and evolution of biological and molecular neural networks greatly differ from 
those used at present when realizing VLSI and optical neural networks. 


1.9 
Neural Mathematics — Methods and Algorithms 
of Problem Solving Using Neurocomputers 


The following question is always actual: what class of problems appears to be most 
adequate to various computer devices based on some new principles? It was consid- 
ered for a long time that neurocomputers are efficient at the solution of so-called non- 
formalized problems or weakly formalized problems. Such problems usually relate to 
the class of problems that require the inclusion of a learning process based on the real 
experimental data into the solution algorithm. 

At present, this class of problems also includes a second class of problems that does 
not require the process of learning based on the experimental data. However, this sec- 
ond problem class can be well represented in the neural network logical basis. This 
problem class is characterized first of all by its pronounced natural parallelization 
properties in the performance of 


= Signal processing; 
= Pattern processing, etc. 


The use of neural network algorithms will also be efficient in the solution of prob- 
lems in which the dimensionality of the input information space can be efficiently 
formed by the use of the Monte Carlo method rather than by the use of standard ana- 
lytic methods. 

We consider that any problem can be solved with the help of a neurocomputer 
much more effectively than with a standard computer due to the fact that any problem 
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algorithm can be represented in the neural network logical basis with the controlled 
neural layers number [1-9]. This means that the neural network algorithm for the 
solution of any problem on the logical level is much more parallel than any of its 
physical implementation. Notice that in the case of transputer and pseudo-transputer 
systems, the solution algorithm that is initially less parallel than the physical imple- 
mentation starts to adapt to a more parallel physical implementation. This property 
principally differentiates neural computers from such systems as transputer ones or 
systems with transputer kernel and peripheral processors of i860-type, Alpha-type, 
Power PC-type, etc. All latter systems are usually characterized by modified algorithms 
taken initially from single-processor computers in which their designers try to mini- 
mize the expenses related to the information exchange between processors in the 
solution process. 

The argument in favor of the viewpoint that neural computers will be more efficient 
than any other architecture is the great enlargement of the problem class solved at 
present in the neural network logical basis. In addition to the aforementioned prob- 
lems, one can also mention 


* Solution of linear and nonlinear algebraic equations of high dimensionality; 

* Solution of systems of nonlinear differential equations; 

= Solution of partial derivative equations; 

= Expert systems; 

* Solution of optimization problems (linear and nonlinear programming) and other 
problems. 


The transfer to the neural network logical basis in all these problems is usually 
performed when the dimensionality of the space solution sharply increases or when it 
is necessary to significantly decrease the solution time. 

In general, two sections of neural mathematics are developed: general neural math- 
ematics and applied neural mathematics. 

Neural mathematics is one of the fields of computational mathematics in which 
solutions are performed with the use of algorithms in the neural network logical basis. 
The main goal of neural mathematic development is to elaborate algorithms with a 
high degree of parallelism for formalized problems as well as for weakly formalized 
and non-formalized ones. 

A criterion of neural network algorithm efficiency is a decrease in the time required 
for the solution as compared with traditional methods. The comparison of algorithm 
efficiency in different neural computer implementations is a separate problem that 
requires a special investigation. 

We shall call a neural network algorithm such a computational procedure; the main 
part of which can be implemented with the help of a neural network. Let us consider P 
to be a formal problem statement. P includes a set of initial data D anda set of objects R 
that must be determined. A basis for the neural network algorithm development is a 
systematic approach in which the problem solving process is represented in the form 
of some dynamic system functioning in time. Thus, the system input is the data set D, 
and its output is the set of objects R. The objects of the set R are determined and got 
their values after the problem solving process. 
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The development of the neural network dynamic system that solves the posed prob- 
lem consists of the following stages: 


1. Determination of an object that represents an input signal of the neural network. It 
can be some element of the initial data, some initial value of the determined param- 
eters, etc.; 

2. Determination of an object that represents an output signal of the neural network. 
It can be a solution itself or some of its characteristics; 

3. Determination of a desired output neural network signal; 

4. Determination of the neural network structure: 

a Number of layers; 
b Connections between layers; 
c Objects representing weighting coefficients. 

5. Determination of system error function, i.e., a function that characterizes a 
deviation of the desired neural network output signal from the real output 
signal; 

6. Determination of a system quality criterion and functional of its optimization, that 
depends on the error; 

7. Determination of weighting coefficient values. This can be done in different ways 
according to the considered problem: 

a Analytically, directly being based on the problem statement; 
b On the basis of some computational methods; 
c Using a procedure of neural network coefficient adjustment. 


A solution with the help of a neural network algorithm consists of the use of a 
designed computer procedure based on some concrete numerical data values. The 
solution process includes the following stages: 


1. Determination of a specified neural network structure corresponding to the used 
algorithm; 

2. Determination of weighting coefficient values or their direct selection from memory 
in the case when these coefficients were previously found; 

3. Generation of initial parameter approximations, if it is necessary; 

4. Transmission of all numerical values to the neural network and activation of this 
neural network; 

5. Neural network functioning according to the following selected modes: 

a A single step mode or a mode with several fixed numbers of steps; 

b A mode with a variable number of steps depending upon the required precision 
and/or upon the specified numerical values of parameters. In this case, the input 
signal adjustment process takes place; 

6. Obtaining the solution. 


In the case of multiple usage of the aforementioned procedures, points (1) and (2) 
can be performed only once. 

We shall call a neural computer such a computational system that possesses an ar- 
chitecture providing the aforementioned steps 1-6. 
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Neural mathematics represents a new field of computational mathematics that is 
oriented on the design of algorithms for the solution of a wide class of problems with 
the use of neurocomputers. The suggested approach for the algorithm design includes 
both well-known computational methods, as well as knowledge already accumulated 
in the domain of neural network calculations. However, this approach significantly 
differs from both the first and second methods. 

Traditional numerical methods are used in neural mathematics only in the case when 
these methods can be effectively parallelized and expressed in terms of neural network 
operations. However, these methods can be sufficiently overworked. 

Practically all known approaches for the neural network design relate in general to 
the selection and analysis of some particular structure forms with known properties 
(Hopfield, Grossberg or Kohonen networks) or to the analysis of some specific modes 
of their functioning. The use of neural networks is reduced to the application of these 
structures for the solution of adequate problems in the case of some modifications of 
their structural parameters. 

The initial point in neural mathematics is the problem statement. This problem 
statement determines the neural network structure adequate to this problem. If it is 
necessary to perform some adjustment, then one uses the properties of neural network 
structure classes that include the obtained structure. 

The class of neural network structures is usually sufficiently generalized (multi- 
layer neural networks with sequential cross and backward connections). 

As a rule, a neural computer must be oriented towards the fast performance of neural 
network operations and towards parallelized algorithms of the neural network adjust- 
ment. 

The development of the neural computer includes the following three parallel lines: 


1. Development of the solution algorithms (neural mathematics); 

2. Development of the neural network theory, structure classes and methods of their 
adjustment; 

3. Development of the neural computer as an assembly of hardware and software 
orientated towards the solution of neural mathematic problems. 


All these levels of development are connected to each other. On the one hand, the 
neural network structure for each problem is determined by the problem itself. On the 
other hand, the development of the neural network theory results in the use of more and 
more complex neural network structures. On the one hand, the level of the used hardware 
determines the level of possibilities for neural network and algorithm development. 
On the other hand, the development of neural mathematics determines the develop- 
ment of neural network theory that in turn develops the hardware implementation. 

At present, the line of neural network investigation depends on the line of neural 
computer development only in the field of software implementation of specific prob- 
lems and their structures. In the future, both neural computer software and hardware 
will be determined by the solved problems and by the neural network implementation. 

The solution with the help of a neural network computer depends on the adjust- 
ment procedure that requires the choice of initial parameter values, the choice of an 
iteration method step value, etc. 
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Due to the fact that not all these processes are well formalized and depend on the 
field of the problem application, they are usually human-aided. 
Two of the following types of dialogue are observed in the procedure of the solution: 


1. A dialogue during the process of neural computer preparation for the solution of a 
specific problem with foregone limitations on the initial data and on the results. The 
main part of such a dialogue consists in the adjustment of weight coefficients. After 
the adjustment procedure, the problem can be solved many times with different 
initial data and one and the same set of problem parameters and neural network 
structure. In the case when the weight coefficient adjustment procedure is absent, 
this type of dialogue is reduced to the selection of the required values out of the 
neural computer memory. 

2. A dialogue during the process of solution. It includes the generation of initial value 
parameters. However, the most labor-consuming part is the dialogue during the 
process of the input signal tuning. In this stage, a designer can analyze the dynamics 
of the system quality functional changes and select the step value in the adjustment 
method. 


Such a dialogue is critical for the solution process. That is why in the tasks where the 
solution time minimization is the most important criterion for the efficiency of the 
neural network algorithms, this type of dialogue must be minimized by means of its 
shortening or automation. It can also be completely excluded by means of complete 
automation of the step value selection or by means of the algorithm complication. In 
the case when it cannot be done, one can try to design the algorithm without the input 
signal tuning and with the help of a weighting coefficient adjustment. 


1.10 
About Neural Networks 


A neural network represents a highly parallelized dynamic system with a directed graph 
topology that can receive the output information by means of a reaction of its state on 
the input actions. Processor elements and directed channels are called nodes of the 
neural network. 

Neural networks at the bottom represent a formal tool for the description of the 
main part of the solution algorithm based on the neural network. The frame of the 
present book, as well as the book [I-6], is a system approach to the neural network 
synthesis, i.e.,an approach to the design of the neural networks themselves and adap- 
tation algorithms similar to the classic adaptive control systems. 

We consider that four approaches to the neural network investigation are possible: 


1. Psychological approach, when it is necessary to model some psychological paradigm 
that requires a development and investigation of the neural network with some 
definite structure. 

2. Neurophysiological approach, when the neural network is developed and investi- 
gated on the basis of the knowledge about the structure of some brain part. The 
neural network models functions of this brain part. 
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3. Algorithmic approach, when some mathematical problem is formulated and an 
adequate neural network with the corresponding algorithm adjusted to this solu- 
tion is designed on the basis of this formulation. 

4. Systematic approach that combines all the aforementioned approaches and repre- 
sents the frame of this book. Figure I.6 shows the general structure of the neural 
network synthesis basically described in the present study. 


Neural network determination, analysis and synthesis 


Inestigationof neural network signals characteristics 


Neural network optimal model design | 
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fixed structure adjusted according 
to the closed cycle 
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Fig. I.6. System approach - the multilayer neural network synthesis 
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1.10.1 
Neural Network Structures 


The automata theory and the theory of Boolean elements in the 1940s, 1950s and 1960s 
formed the basis for the development of architecture and separate units of single-pro- 
cessor computers. The same theory of automata based on Boolean elements continues 
to serve as the logical basis for small-processor, transputer and similar computers, as 
well as for computers with SIMD architecture in which a network of single-bit proces- 
sors (STARAN, etc.) represents a peripheral processor. 

Similarly, the neural network theory is the logical basis for neural computers. And 
this has already been the case for several decades (1950s, 1960s and 1970s). At present, 
this fact has become more evident due to the revolutionary development of the neural 
computer field. 

The neural network represents a network with a finite number of layers consisting 
of solitary elements that are similar to neurons with different types of connections 
between layers. The number of neurons in the layers is selected to be sufficient for the 
provision of the required problem solving quality. The number of layers is desired to 
be minimal in order to decrease the problem solving time. 

This book is dedicated to the description of neural networks with different struc- 
tures. The objective conditions for the transfer from Boolean to the threshold basis in 
computer engineering are given. The main types of the threshold elements such as 
neuron analogues are described. The reasons for investigations of multilayer neural 
networks are analyzed. The interest in such networks appeared in the 1960s after the 
publication of the classical work by Rosenblatt [I-1]. Multilayered features are consid- 
ered specific properties of transformation structure performed by the open-loop sys- 
tem at its topological but not symbolic representation. 

Rosenblatt [I-1] investigated multilayer systems with layers consisting of elements 
with a peer-to-peer topological relationship between the elements of other layers. The 
layers form sensor elements that represent signal sources for the associative elements 
of the three-layer perceptron. Associative elements also form a layer of elements whose 
input consists of the output signals of the sensor elements of the next layer. A multi- 
layer system is a system of elements combined into separate layers with topologically 
equal properties and different characteristics of connections between the layers of 
elements. Different types of multilayer neural networks are considered. They include 
neural networks with sequential connections, cross-connections and backward con- 
nection, as well as continual neural networks. Some special types of neural networks 
are suggested by different authors. 

The following main advantages of neural networks as a logical basis for the complex 
solution algorithms can be mentioned: 


= Invariance of neural network synthesis methods upon the dimensionality of the 
space of features; 

= Adequacy of the modern perspective technologies; 

= Fault-tolerance in the sense of monotonic but not catastrophic change of the prob- 
lem solving quality depending on the number of failing elements. 
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The main goal of this section is to explain why the system aimed at the solution of 
some specific problem must be designed namely in the form of the neural network and 
how to choose this neural network topology (the number of layers, the number of layer 
elements, connections characteristics, topology). 


1.10.2 
Investigation of Neural Network Input Signal Characteristics 


In the first chapter of [I-6], at the investigation of the neural network input signal 
characteristics in the case of the widely spread task of pattern recognition, a notion of 
teacher (supervisor) qualification for the input signal distribution functions is intro- 
duced. These functions include, in particular, the well-known modes of learning and 
self-learning. In the general case, the teacher qualification is introduced in different 
ways for different patterns belonging to objectively different classes. The possibility of 
introducing more specific input signal characteristics is shown. For example, “teacher’s 
slant about his capabilities” is introduced. 

A formal problem statement about neural network learning consists in the approxima- 
tion of a given sample function of the teacher’s instructions by some automatic machine 
with given properties. A formal problem statement about self-learning is considered a 
selection in the input signal space of some areas of pattern distribution function modes 
at the input. A formal problem statement about supervised neural network learning with 
the teacher of a limited qualifications is a generalization of the first two statements. 

The existent investigations in the field of pattern recognition relate mainly to the 
stationary patterns. In this case, the neural network input signal distribution is time- 
independent. The present book deals with non-stationary patterns with time-depen- 
dent neural network input signal distribution. 


1.10.3 
About the Selection of Criteria for Primary Neural Network Optimization 


A class of statistical theory criteria is usually considered criteria for primary multilayer 
neural network optimization in the mode of pattern recognition learning. The examples 
of such criteria are a criterion with a minimum of the average risk function under the 
condition of equality between the average risk function components for patterns of dif- 
ferent classes and a criterion with a minimum of the average risk function under the 
condition of a given value of the average risk function component for one of the classes. 

A precondition for the formation of the criterion and functional for the neural 
network primary optimization in the learning mode is a representation of the input 
signal distribution density in the form of a multi-modal function. In this case, some 
class corresponds to each mode with some probability. Modifications of the average 
risk function are used at the initial investigation stage as criteria for the primary neu- 
ral network optimization in the self-learning mode. This criterion requires a natural 
generalization at the transfer to the continuum of classes and solutions. A separate 
question considered in the book is the problem of the primary optimization functional 
formation in the case of arbitrary teacher qualification. 
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1.10.4 
Analysis of Open-Loop Neural Networks 


A formal technique used in the analysis of the open-loop systems is based on the 
precise methods of probability analysis of multidimensional nonlinear systems. The 
transformation mainly to the analysis of distributions and moments of distribution 
of errors is related to the fact that the results of such analysis are formally indepen- 
dent of the neural network complexity and type. The exclusion is only characteristic 
of the feature space and solution space. Further, the latter significant observation is 
widely used in the stages of selection or the secondary optimization functional for- 
mation, as well as during closed-loop neural network development. 

The secondary optimization functional is a functional expressed through the pa- 
rameters of the current signal distributions in the neural networks. It is directly mini- 
mized in the multilayer neural networks at the closed-cycle adjustment. At this synthe- 
sis stage, mainly two problems are considered. 

The first problem concerns the investigation of correspondence between the sec- 
ondary optimization functionals used in some known works and some criteria of the 
primary optimization. A matter at issue here is the known adaptive system, such as 
Adalin, Stainbuh matrix, or Rosenblatt’s three-layer perceptron (or rather its adjust- 
able output unit). It is mentioned that the main disadvantage of such approaches is 
the absence of analysis of correspondence between the used secondary optimization 
functionals and specific criteria of the primary optimization. This results in the prac- 
tically complete absence of the operation capability of some systems under the con- 
dition of multi-modal distributions of the input signal. 

The second problem concerns the formation of the secondary optimization func- 
tional corresponding to the given primary optimization criterion. A correspondence 
in this case is considered in the sense of the coincidence of neural network parameters 
under the minimum of primary and secondary optimization functionals. The book 
describes a general method for forming the secondary optimization functional corre- 
sponding to the given primary optimization criterion. The results of applying such a 
method for multilayer neural networks of different structures and for different pri- 
mary criteria are shown. 


1.10.5 
Algorithms for a Multivariable Functional Extremum Search 
and Design of Adaptation Algorithms in Neural Networks 


Algorithms for a multivariable functional extremum search and their use for the de- 
sign of adaptation algorithms in neural networks are described in the sixth and sev- 
enth chapters of [I-6]. 

The problem of a secondary optimization functional extremum search procedure 
is widely discussed in the literature. We shall mainly consider the aspects of possi- 
bility and purposefulness of the use of various gradient procedures (newtonian, 
relaxation, steepest descent, stochastic approximation, etc.) for the search of local 
extremum. 
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The use of iteration methods for the design of multivariable functional extremum 
search algorithms has some peculiarities at the development of adaptive systems 
[I-10]. These peculiarities mainly relate to the fact that under the unknown charac- 
teristics of the input signal and under the conditions of so-called a priori insuffi- 
ciency, one cannot say anything about the form of the secondary optimization func- 
tional even when the neural network structure is fixed. It is only possible to say about 
such a property of this functional that it has several local extrema, and all of them, 
or at least some of them, must be found in the process of closed-cycle adjustment. 
Namely this fact requires the necessity to introduce some elements of a random 
nature into the search procedure aimed at the selection of the set of random initial 
conditions for some gradient procedure. Hence, the main problem is to find the 
probability of revealing some number of secondary optimization functional local 
extremums with dependence on the number of emissions of random initial condi- 
tions for the gradient procedure of the local extremum search. One of the problems 
that must be solved in the stage of the closed system design consists in the estimation 
of the secondary optimization functional gradient vector in the neural networks. This 
can be done in two ways: 


1. Introduction of search oscillations and detection; 
2. Representation of gradient vector estimation in the form of expression through the 
signals in neural networks (output and intermediate signals). 


In the first case, one deals with the adaptive search system. In the second case, 
one deals with the analytical system. It is evident that the design of neural net- 
works in the form of analytical systems adjusted in the closed cycle is prefer- 
able because the introduction of search oscillations adds noise to the system. 
However, the design of closed-loop neural networks using analytical methods is not 
always possible. The limitations of the analytical approach are shown in the present 
book while describing the stage of closed-loop neural network design. The main 
attention in the stage of closed-loop neural network design is paid to the imple- 
mentation of given primary optimization criteria in the neural networks of different 
structures. 

An important problem is the development of the neural network adjustment algo- 
rithms in the self-learning mode and arbitrary teacher qualification mode. The meth- 
ods of closed-loop system design here are the same as in the learning mode. This is the 
main idea of the universal approach to the processes of learning and self-learning that 
constitutes the basis for multilayer neural network synthesis described in [I-6] and in 
the present study. 

The analysis of the known heuristic algorithms for the neural network adjustment 
based on the idea of a universal approach to the adaptation algorithms synthesis 
in the neural networks is represented in the work [I-6]. The material concerning ad- 
aptation algorithms in continuum neural networks and the selection of adaptation 
algorithms that are adequate to their physical implementation is also represented in 
this work. 

The peculiarities of the initial condition selection in the neural network adaptation 
algorithms are also considered. 
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1.10.6 
Investigation of Neural Network Adaptation Algorithms 


Investigation of closed-loop neural networks is the final stage of investigation of 
multilayer neural networks with fixed structure and with closed-cycle adjustment. This 
synthesis stage deals with some problems related to the open-loop multilayer neural 
network’s performance quality estimation. 

The first of these problems is the problem of selection of initial conditions for the 
adjustment of multilayer neural network coefficients. As it is mentioned above, the 
secondary optimization functional possesses multiple-extremum properties. This is 
the reason why two methods of initial condition selection are usually considered: a 
random method when all local and global extrema must be found and a deterministic 
method when the multilayer neural network is introduced into the domain of the glo- 
bal secondary optimization functional. 

The second of these problems is the problem of selecting a class of multilayer neural 
network typical input signals that is sufficiently complete to provide the possibility to 
estimate in the future the neural network’s performance quality. In the case of automa- 
tion control systems, this problem is already solved. In particular, it is solved by the 
selection of a polynomial input signal class as a typical one. The signal complexity in 
this case is determined by the polynominal order. In the case of multilayer neural 
networks, the input signal complexity is determined, in particular, by the modality of 
conditional distribution of input signals in the task of pattern recognition. 

The third of these problems is the problem of selecting optimal parameters for the 
multilayer neural network tuning circuit. In particular, it is necessary to select a para- 
metric matrix of the system of secondary optimization functional extremum search. 
Special attention here is paid to the selection of optimal parameters of the neural 
network tuning circuit based on the estimation of the primary optimization functional 
value. The results of investigations of a large number of multilayer neural networks 
obtained by means of computer modeling are presented in the studies [I-1, I-6]. 

In general, one must take into account the following aspects concerning this mul- 
tilayer neural network synthesis stage. The consideration of a non-formal problem class 
of pattern recognition under the unknown and sufficiently complex functions of dis- 
tribution densities causes difficulties not only for the process of development of such 
systems, but also for the attempts to estimate theoretically the quality of the solution 
of these problems. That is why the investigators mainly use methods of statistical 
modeling. The following results are already published: the results of statistical model- 
ing for separate neurons under the multi-modal distribution of input signals, investi- 
gations of adjustment dynamics of three-layer neural networks with sequential con- 
nections, and investigations of adjustment dynamics for multilayer neural networks at 
the non-stationary input signals in self-learning modes and in the supervisor learning 
mode when the teacher has a finite qualification. 

A concrete neural network representing an array of processors with specified con- 
nections and implementing adaptation algorithms is a dynamic system which is de- 
scribed by some system of differential or difference equations [I-8, I-11]. A represen- 
tation of neural networks with closed-loop adaptation circuits in the form of a linear 
sequential machine makes it possible to describe its functioning in terms of classical 
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z-transformation. This allows one to use classical methods of statistical dynamics of 
continuous and discrete automated control systems. This is practically a single effi- 
cient way to analyze quantitatively the dynamics of processor array functioning dur- 
ing problem solving. And this way is provided namely by the neural computer’s con- 
cept. This way makes it possible in the future to analyze and synthesize structures of 
distributed computers in the form of a processor array from the neural computer class. 


1.10.7 
Multilayer Neural Networks with Flexible Structure 


It has been aforementioned that multilayer neural networks with fixed structure and 
closed-cycle-adjustment provide the optimization functional optimum under the con- 
ditional distribution densities for the probability of the arbitrary input signal that is 
unknown beforehand. However, the potential quality of such neural networks is lim- 
ited by the a priori information concerning the structure of the open-loop system. The 
synthesis methods for neural networks consisting of the open-loop structure part that 
cannot be fixed a priori and, along with the adjustment coefficients values, represents 
a result of the adjustment process are considered in the nineth chapter of [I-6] and in 
the present study. The number of layers and the number of neurons in one layer are 
determined in the adjustment process. This book also considers some variants of the 
design of neural networks with flexible structure. The peculiarities of investigation of 
adjustment procedure dynamics on the level of analysis of optimization functional 
dependence upon the number of layers and the number of neurons in one layer are 
described. As a result, the neural network with flexible structure is implemented in the 
form of the uniform multilayer neural network. 


1.10.8 
Informative Feature Selection in Multilayer Neural Networks 


An attempt to analyze from one viewpoint very different and rather numerous works de- 
dicated to the informative feature selection was done in [I-6]. It was an attempt to develop 
so-called structural methods related to the methods of multilayer neural network synthesis. 

The author considers that the widely spread viewpoint about a possibility of a so- 
called preliminary feature selection is not valid. The reason is that in the case of any 
selection features procedure, either directly or indirectly, one must use some concrete 
neural network. That is why any selection features procedure is subjective. And the 
subject is the neural network of a specific type. 

The second author’s proposition in favor of the suggested approach consists in the 
“absoluteness” of the primary optimization functional in the capacity of the index of 
features information value. This is the reason that the estimations based on divergence, 
averaged conditional entropy, etc., are rough and particular. 

The aforementioned notions suggest the necessity to analyze the problem of an 
informative feature selection after the completion of synthesis procedures and after 
the investigation of the neural network’s dynamics. According to the author’s opinion, 
the multilayer neural networks with fixed and flexible structures possess the lowest 
degree of subjectivity with respect to the input signal (that represents the subject of 


1.10 - About Neural Networks 


29 


investigations with the help of neural networks). The reason is that these neural net- 
works are synthesized under the condition that any information about conditional 
distribution densities for patterns inside classes is absent. That is why the usual ap- 
proach is to use the multilayer neural network for the search of maximally informative 
features of the initial feature space. 

The use and investigation of multilayer neural networks allows one to formulate a 
problem of selection of the most informative features of intermediate spaces, but not 
of the initial feature space. These intermediate spaces are formed by the output signals 
of neurons of the first, second and other output layers of the neural network. This 
problem can be considered the problem of structure minimization (minimization of 
the neuron number in each layer) for the multilayer neural network after its adjust- 
ment coefficient procedure is finished. 


1.10.9 
Investigation of Neural Network Reliability 


The problem of the neural network reliability is at present in the most initial stage of its 
development. It is evident that its solution will cause a revolutionary influence upon the 
problem of neural computer implementation on the basis of principally new technolo- 
gies. In particular, it can be a technology of the slab board system design. The property 
of the perceptron structure to preserve its functional capability under the breakdown of 
some number of its elements has been already reported by Rosenblatt in the example of 
the three-layer perceptron [I-1]. Modern computers do not possess this property because 
of the absence of explicit reservation. Computers with MIMD (Multiple Instructions - 
Multiple Data) architecture seem to be the only exclusion. These computers are charac- 
terized by the asynchronous principle of the processor’s array functioning. This provides 
so-called gradual system degradation when some of its elements are broken down. The 
similar property exists in neural computers, too. And this is their great advantage. Meth- 
ods for neural network functional reliability investigations are represented below. In par- 
ticular, we describe some experimental methods for functional reliability investigations. 
In addition, we also describe some methods of parametrical and functional reliability 
when catastrophic breakdowns occur. Some methods of the development and investiga- 
tion of restoring organs based on the neural networks are considered separately. 


1.10.10 
Neural Network Diagnostics 


The neural computer’s structure is specific to the class of computers designed in the 
form of a processor array. This asserts some special requirements for the diagnostic 
procedure in those parts of the neural computer that represent a neural network. The 
basis for such diagnostic methods was developed in the works [I-12, I-13]. This book 
describes some methods of forming the notion of neural network failure, some algo- 
rithms of neural network failure localizations, methods of minimum test design, and 
methods of adaptive diagnostics of neural network failures. In principle, the described 
methods can serve as the basis of development of the neural computer’s operation 
systems for neural network testing. 
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1.11 
Conclusions 


As a result of works published in the 1960s, 1970s and 1980s, a line of investigations in 
the domain of neural network theory that appeared to be prior to the foreign investi- 
gations was formed. 

The following methods of adaptive neural network adjustment were developed: 


= With arbitrary neuron form; 

* With arbitrary number of layers; 

= With different connections between layers (direct, cross and backward connections); 
= With different forms of optimization criteria; 

* With different limitations on the neural network weighting coefficients. 


The author’s basic position in the present study is not the use of the neural network 
of some given structure that is preferable to some investigator, but the search for the 
neural network structure and the search for its methods of adjustment adequate to the 
solved problem. 

The content of the book represents the author’s current results in the field of neural 
network investigations. The main line is the development of the neural network theory. 

In general, neural computers represent a prospective line of the modern giant-pow- 
ered computer’s development. The neural network theory and neural mathematics 
represent a foreground line of development of Russian computer science, and they 
require a support. The bases for the development of these lines are the applied com- 
puter systems consisting of neural computers that must be designed in the nearest 
future. 

Development of the following three lines: solutions of neural network algorithms, 
theory of neural networks and neural computers are tightly interrelated: 


= On the one hand, the neural network structure for each task is determined by this 
task itself, while on the other hand, the neural network theory development pro- 
vokes the use of more complex structures of neural networks; 

* On the one hand, the technology level determines possibilities of neural network 
and neural network algorithm design, while on the other hand, the development of 
neural mathematics stimulates the neural network theory; that in turn determines 
the development of technology; 

= At present, the line of neural network investigations is interrelated with the line of 
the neural computer’s development only in the domain of software implementation 
of the solution of different problems and corresponding structures. In the future, 
both hardware as well as software components of the neural computer will be mainly 
determined by solved problems and by neural network structures. 


Neural computers represent an efficient symbiosis of computer science, adaptive 
system automated control theory, and neurodynamics. 

The list of additional domestic literature concerning neural computers is given in 
the appendix to this section. 
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Chapter 1 


Transfer from the Logical Basis of Boolean Elements 
“AND, OR, NOT” to the Threshold Logical Basis 


A fundamental point for the neural computer’s appearance is the refusal of a Boolean 
logical basis on the level of computer elements and the transfer to the threshold logical 
basis. The latter one models some functions of the nerve cell. This transfer changes not 
only the computer element basis but also all of the computer’s architecture. 


1.1 
Linear Threshold Element (Neuron) 


The works concerning threshold logic appeared in the 1960s and 1970s [1-1 to 1-8]. 
They propose the use of neurons for the design of separate computer units. Here, these 
neurons perform the following logical transformation of input signals into output ones: 


N 
y =sign) \4;x; (1.1) 
i=0 


This is the simplest interpretation of the neuron transfer function. Here, y is a neu- 
ron output; a; are weighting coefficients; a) is a threshold; x; are the neuron input val- 
ues (x;€ {0,1}); and Nis a dimensionality of the neuron input signal. The neuron non- 
linear transformation in this case is the following: 


0, g<0 


12 
i oot (1.2) 


sgn(e)=| 


Figure 1.1 shows a functional block diagram of a neuron. In this particular case, 
when a;= 1 (i=1,...,/), the neuron represents a majority element, and the threshold 
is equal to ay=N/2. 

A threshold function in the expression for the transfer function (1.1) can take any 
arbitrary value but is not only determined by expression (1.2). This changes coefficients 
a, and a;.As a rule, one uses the form (1.2) of the threshold function or the following form: 


-1, if g<0 


sen(@)=| if g>0 


The choice depends upon physical implementation of this function either in ana- 
logue or digital form. 
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Fig. 1.1. a ao 
Functional block diagram of a 
neuron 


The following advantages of a neuron with respect to the Boolean elements AND, 
OR, NOT, etc., can be mentioned: 


1. A neuron performs more complex logical functions. This provides the implemen- 
tation of a given logical function with the help of a smaller number of elements. As 
a result, one obtains the possibility to decrease the equipment size during computer 
unit construction; 

2. Neural networks possess a high tolerance against the failure of separate elements. 
It was mentioned in [I-1] that in the three-layer perceptron, the failure of several 
numbers of neurons of the first layer with arbitrary connections did not result in 
the sharp catastrophic decrease of the problem solving quality even in the absence 
of special diagnostic procedures and structure reconfiguration. Such an ability at 
present is observed only in multi-microprocessor computers with extended paral- 
lelism and SIMD (Single Instruction - Multiple Data) and MIMD (Multiple Instruc- 
tions - Multiple Data) architectures when diagnostics procedures and reconfigura- 
tion are implemented on these structures; 

One of the properties of such structures is the property of so-called permanent degra- 
dation. Accordingly, the main quality of system performance (the probability of correct 
recognition and efficiency) is maximal when all the elements are properly functioning, 
and this quality decreases when the element failure occurs, but not catastrophically; 

3. Neural networks possess an increased tolerance against variation of their circuit 
parameters. This property can be illustrated by a simple logical function of two 
variables implemented with the help of a neuron (Fig. 1.2). Rather large variances 
of weighting coefficients and threshold value do not result in the error of this given 
logical function implementation. 

4. Implementation in the form of VLSI or optical devices with algorithms adequate to 
neural networks is performed in an analogous form. This form possesses much more 
operation speed as compared to the digital implementation. 

Maximal information parallelization is achieved when hardware neural network 
implementation of sufficiently “large” mathematical operations of high dimensio- 
nality is realized. 


1.2 - Multi-Threshold Logics 
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Fig. 1.2. AX 
Neuron tolerance against varia- 
tion of weighting coefficients 
and threshold value 


Values of 
O logical function 
© of two variables 


0 (x1 ,X2) 


5. Neural networks can be formally described as dynamic discrete systems with the 
use of the linear sequential machine technique. This provides the possibility not 
only to analyze the behavior of such systems by means of control theory methods, 
but also to synthesize neural network structures according to the given criteria. 

6. It is possible to minimize the VLSI type in the case of uniform neural network design. 
This allows one to follow up the tendency of simplification of VLSI computer-aided 
design systems. The relative independency of logical neural VLSI algorithm design 
upon dimensionality of input and output space forms a basis for standardization of 
this procedure in a wider class of functional circuits implemented by neural networks. 


12 
Multi-Threshold Logics 


Multi-threshold logics can be regarded as the generalization of threshold logics. The 
logical completeness of a multi-threshold element (MTE) that represents a functional 
cell of some logical device based on multi-threshold logics is connected with the ex- 
istence of a group of thresholds implemented by this MTE. We shall consider MTE as 
an element functioning according to the equation 


K 
y= > {sign[g(n)—a4]+1} (1.3) 
k=0 
given in [1-6]. The MTE block scheme is shown in Fig. 1.3. 

Methods of MTE network synthesis were analyzed during several years [1-7, 1-9, 1-10]. 
But they did not lead to sufficient results. The most perspective methods for MTE net- 
works development are adaptive methods [I-6, I-10]. These methods are characterized by 
a weak dependence upon the input space dimensionality and the network complexity. 
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Fig. 1.3. ay ao 


A block scheme of MTE 


x4 


X2 


XN 


Fig. 1.4. 
A block-scheme of a neuron 
with continuum solutions 


1.3 
Continuous Logic 


This field of science is a natural development of double-digit logic through K-digit 
logic. It has an important significance for the development of neural network com- 
puter theory that considers analogous implementation requiring high operation speed. 
Continuous neural logic includes circuits that allow one to perform logical operations 
with continuous variables [1-11 to 1-13]. Figure 1.4 shows a general structure of a neuron 
with a solution continuum that represents a basis of continuous neural logic circuit de- 
sign. The behavior of function f (that is called an activation function) is considered below 
according to the neuron model representation or according to the class of solved prob- 
lems and to the possibility of designing the adaptation algorithm in the most optimal way. 
The transformation performed by the circuit shown in Fig. 1.4 has the form 


N 
y(n) = Flg(n)]=F| > \a;x;(n) 


i=0 


The function F(g) is a continuous differentiable and steadily increasing function. It 
forms a continuous output signal of the neural network element. In this case, the no- 
tion of a separating surface implemented by the network in the initial feature space 
degenerates. The parameters of F(g) function can be fixed or adjustable. 


1.4 - Particular Forms of Activation Function 
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1.4 
Particular Forms of Activation Function 


One can consider several forms of activation function in addition to the aforemen- 
tioned ones. 


Activation function R is shown in Fig. 1.5. The existence of the linear part allows one 
to implement continuous activation functions on the basis of such elements. The ac- 
tivation function R can be relatively simply implemented. 


Activation function S (sigmoid function). This function is represented in Fig. 1.6. The 
expression for this function is y= (1+ e8)'. 

As distinct from the activation function R, the sigmoid function possesses 
inversability properties and continuous differentiability properties. But this function 
is difficult for implementation. Its disadvantage is the existence of only positive val- 
ues. However, we regard this disadvantage to be insignificant, because it can be elimi- 
nated on the level of the network structure by the element threshold change. 


Fig. 1.5. ay 
Activation function R 


Fig. 1.6. 
Activation function $ 
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Fig. 1.7. 
Activation function tanh (g(n)) 


Fig. 1.8. 
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Activation function tanh(g(n)) and 2/n arctg(g(n)) Represented in Fig. 1.7-1.8. These 
functions are similar by their properties to the sigmoid function. The advantages of the 
use of any particular activation function are determined by the complexity of their neural 
network implementation and by their adaptation to the selected class of problems. 
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Chapter 2 


Qualitative Characteristics 
of Neural Network Architectures 


The main qualitative characteristics of neural network architectures are the following 
(2-1, 2-5]: 


1. 
2. 


Input signal type (dimensionality, discreteness, etc.); 
Types of operations that are implemented in the open-loop neural network (dis- 
crete or continuous); 


. Connection topology (direct, cross, lateral, backward, etc.); 
. Absence or presence of desire to simulate a concrete biological system (visual or 


acoustic analyzer, cerebellum, thalamus, etc.); 


. A goal to maximally increase the operation speed; 
. Architecture limitations related to the user’s convenience or selected technological 


methods; 


. Method of combining into groups of processor elements; 
. Method of performance in time (discrete or continuous); 
9. 


Method of weighting coefficient modification (random or ordered); 


10. Method of connection of independently tuned neural networks. 


2.1 


Particular Types of Neural Network Architectures 


We describe below some particular neural network structures used for the solution of 
different problems on a neural network basis. Figure 2.1 shows the structure of a so- 
called neural network with direct connections. 


Fig. 2.1. 
Neural network with direct 
connections 
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Fig. 2.2. 
Neural networks with cross 
connections 
v v v v 
Fig. 2.3. N 4 
Neural networks with ordered ——-| l¢— 


backward connections 


Fig. 2.4. 
Neural networks with amor- 
phous backward connections 


v v 


One characteristic property of such a network is the equality of number of inputs, 
outputs, and a number of neurons in each of the two network layers. Another charac- 
teristic property is the existence of so-called lateral connections between neurons of 
the first and second layers. Lateral connections in Fig. 2.1 have a limited structure (lim- 
ited area). Figure 2.2 shows a particular structure of a two-layer neural network with 
adjustable weighting coefficients of the second layer that are determined by the output 
signals of the first layer. 

Figure 2.3 shows an example of a neural network structure with ordered backward 
connections, and Fig. 2.4 shows an example of a neural network structure with amor- 
phous backward connections. 
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Fig. 2.5. 

The main types of neural net- 
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tion over the field of processor 
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Figure 2.5 shows particular neural network structures with lateral connections that 
are most often used in the systems of signals and pattern processing. 


2.2 
Multilayer Neural Networks with Sequential Connections 


Historically, multilayer neural networks appeared in the theory of pattern recognition 
for the following reasons: 


1. The linear divisional surface (linear threshold element) does not provide sufficient 
probability of correct recognition in the case of distributions that differ from nor- 
mal distribution with equal covariance matrices. 

2. The hyperplane cannot implement any Boolean function of N binary variables in 
N-dimensional space when N22. 

3. In order to increase the probability of correct recognition in the case when two 
assemblages of vectors of two different pattern classes are distributed according to 
the law that is more complex than normal distribution with equal covariance ma- 
trices, special pattern recognition systems implementing a nonlinear divisional 
surface are usually constructed. In particular, such a surface can be defined by the 
expression 
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i(n) 
neuron 1 
neuron 2 
neuron M, 


Fig. 2.6. Block scheme of the neural network implementing piecewise linear divisional surface 
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Implementation of pattern recognition systems with a nonlinear divisional surface is 
a complex practical problem. It consists in the adjustment of a large number of coeffi- 
cients (the approximate order is (N + r)!/N!r! where N is the dimensionality of the feature 
space, and ris the order of the divisional surface). 

For example, when N is about several hundreds and r is about 6-8, then the number of 
adjustable weighting coefficients amounts to several milliards. The difficulties of the solu- 
tion to design a nonlinear divisional hypersurface are overcome by the use of its piecewise 
linear approximation. If a hypersurface of r order is sufficiently well approximated from 
the viewpoint of the correct recognition probability, then the number of adjustable coef- 
ficients is Nr, i.e., in the aforementioned example it amounts to several hundreds. In the 
task of pattern recognition and in some other tasks with a neural network having neurons 
in the first layer with a finite number of solutions, one must solve a problem to rank dif- 
ferent areas of the initial multidimensional feature space as a particular class (Fig. 2.6). 

Such areas appear due to the intercrossing between hypersurfaces implemented by the 
neurons of the first layer. Each area is determined in the form of set H, of binary signals 
(H, is a number of neurons in the first layer) whose values are (0,1) or (+1,-1) at the 
output of neurons of the first layer and of the corresponding value of the output signal of 
the whole system. The unit for ranking areas must implement in this specific case some 
function y(y},) of H, binary elements. 

This logical function must in turn be realized by the neural network due to the 
known advantages of the threshold logic and due to the requirement of functional 
uniformity of the whole system. Figure 2.7 shows a graph of a multilayer neural net- 
work with sequential connections. 

A detailed classification of multilayer neural networks with sequential connections 
is given in the work [2-6]. The classification is based on the following features: 


= Number of neuron layers with adjustable coefficient; 
= Number of neuron layers with fixed coefficients; 
= Method of coefficient fixation. 
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Fig. 2.7. Graph of a multilayer neural network with sequential connections 


The following structures are described in particular: a two-layer network with ad- 
justable coefficient of neurons of the first layer and fixed or adjustable coefficients of 
neurons of the second layer; a three-layer neural network with different variants of 
layers with adjustable and fixed coefficients. 

The two-layer neural network with adjustable coefficients has limited capabilities in 
the sense of implementation of different configurations of divisional surface because the 
neuron cannot realize an arbitrary Boolean function of N22 binary variables (in this 
case, the neuron of the second layer is considered). If limitation on the neuron number of 
the second is absent, then the implementation of the piecewise linear surface of arbitrary 
configuration requires not more than three layers in the neural network. 

The problem of the three-layer neural network synthesis under the given number 
of layers in the first and third is reduced to the minimization of the number of neurons 
of the second layer and adjustment of neural network coefficients. The investigations 
of multilayer neural networks with sequential connections showed that their perfor- 
mance quality is a monotone increasing function upon the increase of the number of 
layers and number of neurons in each layer. 


2.3 
Structural and Symbolic Description of Multilayer Neural Networks 


One can mention that recently, the significance of structural methods sharply increased 
in the investigations of different systems as compared with symbolic methods. The 
main reasons for this tendency are such properties of these systems as multilayer char- 
acteristics, multiple image and high dimensionality. Namely, these properties possess 
modern neural networks. The present study is aimed at the development of a struc- 
tural approach. In this approach, the relative role of the learning block development 
decreases, and correspondingly the role of choice of the open-loop system structure 
increases. This is the reason that in addition to the open-loop system symbolic de- 
scriptions, it is necessary to use the structural representation of transformations. We 
give below a formal description of the main types of multilayer neural networks. 
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Here N= H) is the dimensionality of the initial feature space. 
The arrow and symbol indicate the signal designation that is described in the 
equation by the expression to the right of the arrow, and 


w-— i+ i) 


W-j+l 
Cis aH) 


ane Shy —j44 
are respectively the analogue output signal and digital output signal of hy,;,,-th 
neuron of (W -j + 1)-th layer for the considered multilayer neural network. 

A multilayer neural network with K solutions is obtained by means of the ex- 
change of nonlinear transformation fin (2.2) by the expression determined by Eq. (1.3). 
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c Multilayer neural networks with H,, output channels. 

A symbolic description of such a system is relatively simple to obtain from Eq. (2.2) 
and from the graph of the neural network shown in Fig. 2.7. In particular, one can 
consider the case of signals €(n) and y(n) with equal dimensionalities. 

d Multilayer neural networks with cross connections. 

In multilayer neural networks with full cross connections [1-6], the set of fea- 
tures of j-th layer (j =1,...,W) consists of features of the initial space and output 
signals of all layers with numbers from 1 to (j- 1). 

Analysis of particular structures with full cross connections shows that they are 
significantly simpler in terms of the number of neurons than those of structures 
with full sequential connection under the condition when both structures imple- 
ment the same configuration of divisional surfaces in the feature space. In particu- 
lar, for the two-layer neural network with cross connections, 


N 
> 4p, x;(n) 
i=0 


Hy N 
y(n) =F x Ap, F +) a;x;(n) 
hy=0 j=0 
A graph-scheme of such a neural network is shown in Fig. 2.8. 
It is possible in principle to consider a multilayer cross connection neural net- 
work of an arbitrary structure. 
e Multilayer neural networks with backward connections. 
For the neuron with backward connection (Fig. 2.9): 


N 
S\a;x;(n)+a'y(n—1) 
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For the multilayer neural network with backward connections (Fig. 2.10): 
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Fig. 2.8. 

Graph-scheme of the neural 
network with cross connec- 
tions 


Fig. 2.9. 
Graph-scheme of a neuron 
with backward connection T 
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Fig. 2.10. a", 
Graph-scheme of a two-layer 

neural network with backward 

connections ak 


Fig. 2.11. Ax: 
Hyperplanes realized in the 


input feature space by the 
neuron with k-valued weight- 
ing coefficients 


It is possible in principle to consider multilayer neural networks of arbitrary given 
structure with backward and cross connection. The objective reason for cross con- 
nections being introduced in a multilayer neural network is proved below in Chap. 3. 
As far as backward connections are concerned, they are considered in the present 
book in the investigation of closed-loop neural networks for nonstationary pattern 
recognition. 

f Neural networks with k-valued and binary coefficients. 

The difficulties of physical implementation of adjustable variable weighting co- 
efficients of multilayer neural networks are well known. They emerged, in particu- 
lar, at the development of memistor systems in the 1960s. Their authors tried to 
implement an open-loop system and block neural network adjustment in the ana- 
logue form [2-12]. 

These difficulties remained the same on the modern stage of VLSI technology. 
However, the sharp increase of the integration level provides implementation of 
neural networks with neurons having k-valued weighting coefficients realized, for 
example, on the resistor networks. In the simplest case, the binary values (0,1) of 
weighting coefficients realized on the monitored switches can be used. This pro- 
vides sharp simplification of physical implementation of the multilayer neural net- 
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Fig. 2.12. Ax; 
Hyperplanes realized in the 
input feature space by the 
neuron with binary values of 
weighting coefficients (0,1) 
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Fig. 2.13. Multilayer neural network 


work adjustment procedure consisting of such adjustable coefficients. When con- 
sidering each neuron with k-valued or binary coefficients, a logical function in the 
input variable space is realized. This is done by means of the divisional hyperplane 
slope change by some fixed level (Fig 2.11) or by the use of hyperplane “parts” of 
three types on the total hypersurface (Fig 2.12). 

Evidently, the lower the number of gradations the weighting coefficients of neu- 
rons in the neural networks have, the higher the number of neurons in the neural 
network is necessary for the solution of some problem. 

The modern level of technological development is quite ready to accept a gen- 
eral neural network structure represented in Fig. 2.13. Methods of adjustment al- 
gorithm synthesis for such neural networks are the main subject considered in the 
present book. 
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Chapter 3 


Optimization of Cross Connection Multilayer 
Neural Network Structure 


The problem of selection of an open-loop multilayer neural network structure is very 
complex. This structure can be taken a priori, or according to the aforementioned reasons 
while considering a two-layer and three-layer neural network, or according to the limited 
technology facilities. We consider below the possibility of the structure selection (number 
of layers and number of neurons in the layer) for a multilayer neural network with cross 
connections consisting of neurons with two solutions. 


3:1 
About the Problem Complexity Criterion 


It is necessary to discuss the problem of a complexity criterion for a pattern recognition 
task solved by the multilayer neural network. The number of reference patterns included 
in the closed areas by hyperplanes realized by a neuron of the first layer in the initial feature 
space can serve as such a criterion when a deterministic neural network model is used. In 
the case of a probabilistic neural network model, each reference pattern corresponds to the 
mode of distribution probability function for pattern assemblage at the neural network 
input. In each area of the initial feature space, the multilayer neural network selects some 
compact pattern set but not the reference pattern. When the assemblage of patterns pos- 
sesses multimodal distribution at the neural network input, these compact sets can be char- 
acterized by some areas in the multidimensional feature space formed by lines of equal 
distribution probability function values (on the certain level). The number and complexity 
of such areas characterize together the complexity of the solved problem. The determinis- 
tic neural network model can be considered as a particular case of the probabilistic one, 
and it realizes at the bottom the system of memory for a finite number of multidimensional 
vectors. The number of areas realized by the multilayer neural network in the initial feature 
space is considered in this paragraph as a quality criterion of this neural network. The 
quality of the aforementioned multilayer neural network with sequential connections in- 
creases with the increase of the number of layers and the number of neurons in each layer. 

Therefore, the problem of neural network optimization (minimization of numbers of 
layers and neurons) is formulated either to eliminate neuron number redundancy, or under 
limitation on the number of neurons. 

The main attention below is paid to the multilayer neural networks with full cross 
connections. In this case, a feature set of each j-th layer consists of features of initial space 
and input signals of the first, second and (j-1)-th layers. The problem of structure opti- 
mization for such a neural network is relevant. 
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3.2 
One-Dimensional Variant of the Neural Network 
with Cross Connections 


Let us consider the principle of cross connection operation on the simplest one-di- 
mensional example with N = 1 (one feature x). Figure 3.1 shows a block scheme of such 
a neural network. The divisional surface realized by the network without a cross con- 
nection is shown in Fig. 3.2. 

In regions I, II and III, analogous output signal of the neural network g under the 
activated cross connection is represented in the form 


81 = 40 1 ayX — ay — Ay 


S11 =A + 4yX + Ay — Ay 


Sint = 49 + 4X + ay + ay 


The neural network divides each of the regions I, I] and III into two subregions, 
where g= 0 and g< 0. From the condition of zero values of 2, gp gi One obtains the 
expressions for additional thresholds under the activation of a backward connection 
in the space X: 


a, +a,—a a,—a,—a —a,—a,—a 
x= 2 0, X= 27 A Oo. x3 = 1— 42 — 40 
a a 


a 


n n n 


Thus, the considered neural network (Fig. 3.1) realizes not more than five thresh- 
olds that divide the x axis into six regions. The neural network in this case is equivalent 
to the multilayer neural network with sequential connections consisting of five neu- 
rons in the first layer. This means that the neural network with cross connections is 
realized much more simply than the neural network with sequential connections. 


Fig. 3.1. An { ao 
Two-layer neural network with 
a cross connection, one-di- SI > 


mensional variant 
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Fig. 3.2. To the principle of cross connection in multilayer neural networks 
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In the process of analysis of a multilayer neural network, one must know the maxi- 
mum number of regions into which the feature space of dimensionality N can be di- 
vided by H, hyperplanes. The maximum number of regions 'y;,, is determined ac- 
cording to the following recurrent Eq. (3.1): 


Pn, =? neat? N-1,H)-1 (3.1) 


or in the non-recurrent form 
” N-1 . 
1 
Py =CH, +2), CH -1 
i=0 


It is implied that C; = 0 if t < s. The following expressions can be derived from (3.1): 


Py, =2" > if A, <N (3.2) 
and 

Py, <2 , if Hy>N (3.3) 
3.3 


Calculation of Upper and Lower Estimation of the Number of Regions 


Let us consider a multidimensional variant (i = 1,..., N) of the neural network with the 
structure shown in Fig. 3.3. 

The number of areas formed by the division of the initial feature space by the 
(j - 1)-layer neural network is designated as My j;_;}, where [j-1] conditionally indi- 
cates the equivalent number of hyperplanes realized by the multilayer neural network 
with full cross connections and with the (j-1)-th layer. Such a network consists of 


jal 
Lji1=) 0H; 
i=l 


neurons, where H; is equal to the number of neurons in the i-th neural network layer. 
The input channels of each h,-th neuron of the j-th layer (h;= 1,...,H;) can be divided 
into two sets, as it follows from the block scheme. The first set consists of the input 
signals of the neural network. The second one consists of the output signals of the 
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tions 
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layers 
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neural network from the first, second, ... (j-1)-th layers. Then the equation of the di- 
visional surface realized by one h;-th neuron of the j-th layer takes the form 


ay, 1X = ay, ~ Oho, =o 

Here a,,, is the vector of adjustable weights of the neural network input signals of 
the /-th neuron; a,,, is the vector of adjustable weights of the input and intermediate 
signals of the (j-1)-layer network of the h;-th element; ay is the threshold of the hj-th 
neuron; and x;,;_, is the vector of the neuron output signal of the (j-1)-th layer. 

Therefore, relative to the initial feature space, each neuron in the j-th layer imple- 
ments as much parallel hyperplanes as it is the number of variants of x,;_, that are 
generated by the (j-1)-layer neural network. Let us assume that there exists such an 
adjustment procedure in which all hyperplanes generated by vector x,,;_, of the j-th layer 
belong to the initial feature space area corresponding to it. Then the recurrent equa- 
tion for the calculation of the upper estimation of number of regions can be written in 
the form 


This follows from the fact that each of the Yy;;_1) regions selected by the (j-1)-layer 
sub-network is divided into Pyp,; regions, where yj; is determined by the recurrent 
relationship (3.1). 

Let us now find a non-recurrent equation. Equation (3.4) and the fact that the first 
layer of the neural network divides the feature space into 'Yy;,; areas result in 


j 
i=1 


In order to derive the estimation for the lower number of regions, let us assume that 
several hyperplanes out of '¥y;;_;; hyperplanes can be put through any point of the 
initial space by the change of only a free term in the equation of hyperplane. These 
hyperplanes can belong to any area. The system of linear equations of relatively adjust- 
able weights and the threshold of the h;-th element for the estimation of their numbers 
has the form 


ay - 
0 001x 1hj2 Zs 
0 O1ll1x Hah? 
0 101 x! 
aT a | Dd TO ed (3.6) 
0 
‘s ap. 
J . 
tisvi ti an q¥N[j—1 
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Here q; (i= 1,..., Pyj-1) are arbitrary given values. The number of values q; that 
satisfies (3.6) is a required number of hyperplanes out of the number y;_,) that can 
be put through any point of the initial space. This number is (L,_, + 1) and equals the 
dimensionality of vector a), plus unit, as it follows from (3.5). 

Then the following recurrent equation for the calculation of the lower estimation of 
number of regions is valid: 


Myr ="wja—-(Ejat)+(Lj4 +1)" wu; 


Here yj;_1)~- (L;_; + 1) is the number of regions in which new hyperplanes are not 
put through; (L;_, + 1)'¥yq; is the number of new regions that emerge after the parti- 
tion. 

Finally one obtains 


Yu =Yujat(Ljat 1) Pn j - 


(3.7) 


Expression (3.7) is the final result. In a one-dimensional case, (3.7) obtains the form 


Y= Yjayt+(Lja+)H; 


(3.8) 


3.4 
Particular Optimization Problem 


One can formulate several problems of the multilayer cross connection neural net- 
work structure optimization. 


1. The number of layers and the number of neurons are given. It is required to find the 
neuron’s distribution in the layers that maximize the number of regions Y formed 
piecewise linear divisional surface realized by the given multilayer neural network 
in the initial feature space. 

2. The total number of neurons is given. It is required to find the number of layers and 
neuron’s distribution in the layers that maximize the number of regions 

3. The number of regions Y that must be realized by the network and the number of 
neurons are given. It is required to find the structure that minimizes the number of 
neural network elements. 

4. The number of regions Vis given. It is required to find the structure (the number 
of layers and neuron’s distribution in the layers) that minimizes the number of 
network neurons. Note that the structure optimization by the number of regions 
represents a particular optimization criterion of the neural network. 


Let us consider the structure synthesis of a one-dimensional variant of the network 
for the aforementioned optimization problems. 
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1. The number of layers W and the number of neurons 
Ww 
DA; 
j=l 


are given. Let us find the neuron’s distribution in the layers that maximize the num- 
ber of regions ¥, y. Formally the problem is stated in the form of relationships written 
with consideration to (3.5) and (3.1): 


WwW 
24; =H 

j=l 

opt ua 
aren Ue 


The Lagrange method of multipliers gives the solution in the form of the system 
of equations 


Ww 

[](#j+1)+4=0, i=1,...,W 

jal 

i (3.9) 


The solution of the system (3.9) is 


1 
H,=—H (j=1,...,W 
1 G ) 


W-l (3.10) 
A= (4 
W 
It follows from (3.10) and (3.5): 
W 
t H 
vie = [24 (3.11) 


i.e., under the given number of layers the given neurons must be uniformly distrib- 
uted among layers. In relationship with (3.10), the question about integrality of H; 
(j = 1,..., W) arises. If H cannot be divided by W integrally, then it follows from (3.5) 
and (3.10) that the remaining elements must also be distributed uniformly among 
layers in an arbitrary way. 

In this sense, Eq. (3.11) is the upper estimation of 5 aa that becomes a precise 
upper estimation when H = KW, where K is an integer number. 
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2. The total number of layers W is not given beforehand. The number of neurons H in 


the network is limited. Let us find an optimum by the upper estimation structure. 
This can be expressed in the following form: 


oe = max aie 


W 
> et 


j=l 


From the evident inequality followed from (3.11), one gets 


eH) eae 


and consequently the number of regions is a monotonic increasing quantity with 
the increase of the number of layers. Consequently, the H-layer neural network with 
one neuron in each layer is optimal. It follows from (3.11) that for this network, 


H 
a [Fa] =2H 
H 


is the precise upper estimation. 

. The number of layers W and the number of neurons are given. Let us find the struc- 
ture optimum by the lower estimation. In order to do that, let us represent (3.8) in 
the non-recurrent form: 


2 
Yaw =1+) Hj +5) 04; | —S 224i (3.12) 
j=l jal jal 


According to the Lagrange method of multipliers, constrained extremum (3.12), 
under condition 


Ww 
+ = 
i=l 


is achieved in the case when H; (i= 1,..., W) is a solution of the system 


H,(A-1)+H+1=0 


Ww 
\ HH; -H=0 
i=1 
Then H, = H,=...=Hy=(1/W)H and under given W, 
2 2 
opt HY 4H 
Py) =1+H+—-— (3.13) 


2 2W 
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It follows from (3.13) that a is a monotonic increasing quantity when W > « 
and it becomes precise when H = KW, where K is an integer number. Consequently, 
par = 1+ (H+ H’)/2 for the H-layer neural network with one neuron in the layer. 
Hence, in the one-dimensional case (N= 1), the structure’s optimum by the lower 
and upper estimations coincide. 

4, Similar to the one-dimensional case, in a multidimensional neural network variant 
optimum by the upper estimation, on the basis of (3.5), one can write 


Ww 
iA =max max | [Yyq, 
W<H Hong 


be (3.14) 
> Hj =H 
j=l 


It follows from (3.14), (3.2) and (3.3) that a whole class of structures satisfies the 
optimal conditions (3.14). Namely, for all structures with HS N G=1,...,W), 


WwW 
> H;=H (3.15) 
j=l 
For these structures, 
W 
2 Hj 
pag oF (3.16) 


For the structures with H;> N, for each j= 1,..., W: bah <2", 


3.5 
Structural Optimization by Some Main Topological Characteristics 


A natural desire to decrease the total number of neural network inputs emerges when 
technological implementation of the network is performed. This is due to the fact that 
the number of inputs is the number of technologically complex realized multiplier units. 
The number of inputs at the selected structure adjustment stage is equal to the tuned 
coefficient space dimensionality. Therefore, the decrease of the number of neuron inputs 
of the multilayer neural network simplifies both implementation and adjustment. 

The total number of neuron inputs in the i-th layer of the neural network with full 
cross connections is equal: 


Consequently, the expression for the total number of neuron inputs in the W-layer 


neural network is 


WwW Wii WwW ft WwW 
Yw=do4j= D0 DI, +N Hj=N) Hj +>H ep (3.17) 
j=l j=l[j=1 j=l j=l 
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The problem of synthesis of the multilayer neural network with full cross connec- 
tions and optimal by the upper or lower estimation for the number of regions under 
the limitations upon the number yof network inputs is formulated on the basis of (3.17) 
in the following way: 


PNW] =max max Pyw 


W 4Aj,...Hw 
w | 2 iw (3.18) 
2 
Y2N) Hj +5|Q04j| 7245 
j=l j=l j=l 


Index * indicates an extremum value. Taking into account (3.17), the inverse prob- 
lem, i.e., the problem of synthesis of structure for a multilayer neural network with full 
cross connections and minimum total number of inputs under the limitation on the 
number of regions ¥, has the following form: 


Yw =max max |N)_H,+—|)_H;| —-)_H;j 
WH Hw] 5 J 2 pa J 2° J (3.19) 


Pyiwyo 


The number of regions Yyw) in (3.18) and (3.19) is determined either by (3.5) or 
by (3.7) depending on the form of estimation. 

The inverse problem statement for synthesis is less practical then the direct one 
because the limitation on the number of input channels in the neural network is more 
physically valid then the limitation on the number of regions. 


Example 1. Let us show that the structure of the multilayer neural network optimum 
by the upper estimation of the number of regions with a limitation on the number of 
elements in the one-dimensional case will be optimal by the upper estimation with a 
limitation on the total number of inputs. According to the Lagrange method of multi- 
pliers and (3.18) with N = 1, optimal H, and Hy are the solutions of the following sys- 
tem of equations: 


WwW W 

[] Gj t+04+4)14+ 52 8j|=0 7=L..W 

i=1 j=l 

=i j=B (3.20) 
2 


Ww 
DUH + 2/4; -soH}-y=0 
a = 


Here A is the Langrangian multiplier. 
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The solution of (3.20) is 


H,=1, j=1,....W 
QW (3.21) 
W 


A=- 


Here W is the number of layers. It represents an integer part of the positive root of 
the equation W?+ W- y/2=0. 
Equation (3.21) proves the initial statement. 


Example 2. Let us consider the synthesis of the neural network taking into account (3.19), 
where x;y, is determined by the expression (3.5). Note that transference of a neuron 
from the j-th layer into the (j-j,)-th layer causes the decrease of the number of neuron 
inputs for the neural network with full cross connections by 
J-h 
Ay = >> Hj (3.22) 
A=j4 
and the total increase of the input numbers of the rest of the neural network elements 
will amount to 
aH 
Ay'= >> H; (3.23) 
A=ij 
It follows from (3.22) and (3.23) that the transference of a neuron from the j-th 


layer into the (j-j,)-th layer causes the decrease of multilayer neural network input 
channels if 


ji J-RnH 
Do Ha> 2) A, or Ay > Hj 
A=j-1 A=) 


It follows from (3.22) and (3.23) that two structures with the total number of net- 
work neurons H = [log, Y] (square brackets mean rounding up) satisfy the optimiza- 
tion condition (3.19): 


H, =A; Hj=N,j=2,..4.W 


H,=N; Hy=A,j=1,...W-1 


Here A is a remainder of division of H by N; 


0, A=0 


W=——+6 , where 6= 
1, A+0 


Both structures correspond to the equal number of inputs determined by (3.17). 
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Example 3. Let us consider the optimization of the neural network structure that is 
optimal by the length of connections. 
Let us assign some weight Uj, to each connection from the j,-th to the j-th layer. Let 
us also designate the connection length of the input vector with the j-th layer as Uj. 
Then the total length of connections between neurons in the j-th layer is described 
by the expression 


j-l 
Yah UpjN+ )) HU Aj 
j=l 


The total connection length in the W-layer neural network is evidently equal to 


W joi 
Vw =>0H; UyjN+ SIH, Yj (3.24) 
j=l j=l 


Similarly to (3.19), one can write 


YNw] = max max Y yw] 


W #Aj,...Hyw 
Ww j-l (3.25) 
V>)04j UpjN + Ai 
j=l j=l 
ja 
Va =min min HjU,j;N+ Hi. 
We Hie: i yu aval (3.26) 
Pyiw\ 2 


The number of regions Yin (3.25) and (3.26) is determined by the expressions (3.5) 
or (3.7), depending on the form of estimation. Note that at Uj j= 1 (j, =0,1,...,; W-1; 
j=1,..., W) the expression (3.29) coincides with (3.17), and expressions (3.25) and (3.26) 
coincide respectively with (3.18) and (3.19). 


Example 4. Let us consider now the most general limitations on the neural network 
structure. They include all the aforementioned limitations as particular cases. The cost 
of one neuron will be ,,, the cost of one input will be B,, and the cost of one connec- 
tion unit will be 8,. Then according to (3.17) and (3.24), the total cost takes the form 


Ww : 1m : 
Sw = Bia Hy, + BaD Hy +2 7; 1 | 5h 
j=l j=l j=l j=l 


(3.27) 
W WwW 
+By| 7 HUoyN+ D7 Aaj 
j=l j=l 
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Let us formulate the problems of neural network synthesis with a limitation on the 
co st Sy in a similar form t 0 (3.18) and (3.19): 


Pyiw]=Mmax max Yyiwy 
W Hj,..Hw (3.28) 


ay 


ae =min min Sw 
W Aj,..Hw (3.29) 


The cost Sy in expression s (3.28) and (3.29) is determined by (3.27), and the 
number of regions Yy,y, is determined by (3.5) or (3.7) depending on the form of 
estimation. 

All the problem statements considered above for the neural network structure syn- 
thesis can be obtained by variation of cost coefficients B,,, B,,and B,, in (3.28) and (3.29). 


3.6 
Optimization of a Multilayer Neural Network Structure 
with K, Solutions 


The elements of a multilayer neural network with full cross connections are described 
by expressions represented in the previous chapter. Each element realizes an assem- 
blage of parallel divisional hyperplanes in its feature space. The maximum number of 
regions selected in the initial feature space by the equivalent divisional surface does 
not exceed ae where H is the number of network neurons. This estimation is achieved 
only for a multilayer neural network with full cross connections. 

Let us estimate a number of regions that can emerge as a result of the partitioning of 
the N-dimensional feature space by H, groups of hyperplanes consisting of (K,-1)-th 
hyperplane in each group. Let us designate a maximum number of regions selected by 
the [H,-1] group as ae yy Then similar to (3.3) and (3.4) 


yp = Ae yt2 


N[Hy] 


Let us estimate a z value. When placing each of the (K, - 1) hyperplanes, the num- 
ber of selected regions increases by the number of regions formed on the hyperplane 
by lines of its intersection with other hyperplanes, i.e., by yr Ly -1 

Consequently, 


z=[k, 1] 


N(Hy-1] 


and finally 


pip _ y*Ip te yt{Kp -1] (3.30) 


NEA) NF -1] 


with the initial conditions 
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P= ; PP) = AiR Kp-1|+1 (3.31) 


It can be shown from (3.30) and (3.31) that 


ye 


te] = Kp! when H,<N 


au <Kjl when H,>N (3.32) 

Let us consider the /;-th neuron with K, solutions located in the j-th layer of the 
multilayer neural network with full cross connections. The input signals of the h,-th 
neuron can be divided into two groups: input signal vector x = [x,,...,xy] and row- 
vector y = [y,,...,,] of output and intermediate signals of the (j- d)- -layer neural net- 
work. Let us assume that the (j-1)-layer neural network selects PAR 1) regions in the 
initial feature space. 

Then wre ‘{j-1) different variants of vector y can come through the y-channels to the 
inputs of the h,-th neuron. The equation for the output signal of the h;-th neuron with 
K, solutions can be written in the following form: 


= 1 = 
Xkhj =F, (42+ hj) 7 hj =1...H; (3.33) 


where A,,, and Aj, are the vectors of weighting coefficients for x and y, respectively. 
Each of the H; neurons with K, solutions realizes geometrically (K,-1) i par- 
allel hyperplanes in the neural network input signal space according to (3.33). The 
expression for the upper estimation of the number of regions that emerges as a result 
of decomposition of X space by the considered j-layer neural network has the form 


Pa= Pal aah; (3.34) 

Here Pe is determined by (3.30) and (3.31). Taking into consideration that (3.34) 
is a recurrent expression and that the first neuron layer with K, solutions decomposes 
the X space into Pe regions, the expression (3.34) can be rewritten as 


yp. -11¥%, (3.35) 


The expression (3.35) allows one to formulate and to solve the problem of synthesis 
of the neural network structure optimum by the upper estimation of the number of 
regions under the limitation on the total number N of neurons in the neural network. 
It follows from (3.35) and (3.32) that in the W-layer neural network 


=Kj" if H)<N 
(3.36) 


yp) <Ke if H;>N (j=1...,W) 
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Consequently, the following neural network with full cross connections will be 
optimal by the upper estimation of the number of regions: the number of neurons with 
K, solutions in each layer of such a neural network must not exceed the dimensionality 
of the initial feature space. 
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Chapter 4 


Continual Neural Networks 


A large number of parameters characterizing the input signal must be taken into con- 
sideration while developing multilayer neural network structures. An example is a 
design of a pattern recognition system with the requirement of maximum probability 
of correct recognition [4-1]. It is suggested to take into account continual properties of 
multilayer neural network characteristics in its mathematical modeling and in its tech- 
nological implementation. 


4.1 
Neurons with Continuum Input Features 


The transfer to the continuum features becomes important when the dimensionality N 
of some feature vector x; becomes large (several hundreds or thousands). The feature 
vector in this case {x;, i= 1,...,N} is replaced by the function of indiscrete argument 
{x(i),i€ I}, and the weighting vector {a,,,m=1,...,M} is replaced by the weighting 
function {a(m), me M}. The neuron model with a continuum of features, similar to 
the classic discrete case [4-2], is determined by the expression 


y =sign Jf a(n)x(n)dn +o (4.1) 
M 


where y is a neuron output signal; x(m) is a neuron input signal; and ay is a threshold 
value. 

The transfer to the feature continuum at the input of the first neural network layer 
often excludes the requirement to quantize the input signal (for example, electric sig- 
nal or pattern, etc.). The method of technological implementation of the neuron weight- 
ing function is selected depending on the concrete physical type of the input signal. 
For example, when the input signal is an electric one changing in time, the weighting 
function must also be generated as an electric signal. In the case of optical patterns, the 
weighting function can be realized on the photomask. For the discrete set of neurons 
with continuum feature space, 


Ym = Sign Jf x(mysk) aa (k)dimy + ag, (4.2) 
M, 


The neural network structure according to (4.2) is shown in Fig. 4.1. 


68 


Chapter 4 - Continual Neural Networks 


Fig. 4.1. ay(n) Aa 
Neural network layer with Ew yx(n) 
continuum input features -—> L>&)->| sign t——o 
a, (n) Py $6 
x(m1,n) L 860-41 sign Ym,(N) 
[ea | au, 
a + : ym,(N) 
pe > sign se) 


4.2 
Continuum of Neurons in the Layer 


The transfer to the continuum space of the layer output signals can be interpreted as 
a continuum of neurons in the layer. Then the output signal is realized not in the form 
of a finite-dimensional vector, for example consisting of + 1 and -1, as it is shown in 
Fig. 4.1, but in the form of a y(m,,n) signal having values + 1 and -1 in the interval 
of variance of indiscrete argument m,. Consequently, taking into account (4.2), the 
output signal is an infinite-dimensional vector with components 


y(m,n)=sign { x(my,k)a(m ,m, )dmy + ag (my) (4.3) 
M, 


The expression (4.3) forms the basis for physical implementation of open-loop sys- 
tems of the type considered. 


4.3 
Continuum Neurons in the Layer and Discrete Feature Set 


In the particular case of discrete feature set and continuum neurons in the layer under the 
transfer from indiscrete variable m, to the discrete one, the expression (4.3) takes the form 


M, 
y (m,n) =sign > ang (mz) + ag (m2) (4.4) 


my 


This expression is the basis for the implementation of the neural network whose 
particular case is represented in Fig 4.2. 

The neurons layer output signal is an electric signal with form features at the n-th 
period. Electric signals a,, (m ) and a)(m,) are generated inside the system at each 
n-th step. As distinct from the system of signal form recognition, their period is deter- 
mined a priori. The electric signal of two values (1,-1) in the interval (0, M,) emerges 
at the layer output (Fig. 4.3). 

Such a layer model with a continuum of neurons is adequate for the neurophysiologi- 
cal neuron model under the pulse-frequency modulation of the layer output signal. 
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Fig. 4.2. a4 (m2) 
Discrete feature set. Continu- 
um of neurons in the layer x4(N) 
am,(Mae) €o(M2) 
Xm{n + ! Me,n 
zat > > »y ~&) >| sign Vine ) 
am, (m2) 


Fig. 4.3. A y(m2, n) 
Form of the neuron layer out- 
put signal 


4.4 
Classification of Continuum Neuron Layer Models 


The systems considered below serve as the recognition of one-channel electric signals 
of the “bump” type, ie., the signal with lock-in bump onset and periodic signal with 
lock-in at each period (Fig. 4.4). 


4.4.1 
Discrete Set of Neurons 


A basis of the neural network implementation in this case is the expression (4.2) in 
which Yng(1) has two values (1, -1); x(m,,n) and Am (N) are electric signals at the n-th 
period; dp,,, is a constant coefficient. 


4.4.2 
One-Dimensional and Two-Dimensional m, Feature Space 


A basis of the neural network implementation in this case is the expression (4.3). The 
difficulty for implementation is a multiplier unit for the multiplication of a continuous 
function of one variable by a continuous function of two variables, one of which coin- 
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a x(m,, n) 


y* 


to to+T 


= T-t 2T - to 3T to 4T — tp 


Fig. 4.4. Types of signals received by the neuron layer in the case of features one-dimensional and one- 
channel continuum m,: a - bump signal; b - periodic signal 


Clock pulses erasing the image 
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Fig. 4.5. Hypothetical variant of optical system implementation in the case of one-dimensional one- 
channel m, and two-dimensional m, 


cides with the variable of the first function. The requirement of the continuum type of 
the layer input space makes it necessary to realize the function x(m,, n) in the form of 
an electric signal. At the same time, the requirement of the continuum type of the layer 
output space makes it necessary to realize the function a(m,,m,) in the form of an 
image. The main difficulty is the multiplication of functions x(m,,n) and a(m,,m,). 
One can assume the physical existence of the optical element x,,,, whose dimness (or 
brightness) changes in the real time along one of the coordinates depending on the 
form of applied voltage, and the form of the output voltage - depending on the dim- 
ness (or brightness) distribution along the coordinate at the integration of the inten- 
sity along another one. Such an element can be conditionally called a “spatial optical 
coupler”. Then the implementation of this network could have the form shown in 
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Fig. 4.5, where the layer input electric signal would come to the vertical plate of the 
element x,,, at its input, and the output signal would be read from the horizontal plate 
of the output element x... 

Consequently, it is difficult to consider two-dimensional m, at the existent level of 
technological development because it requires the physical implementation of three- 
dimensional function a(m,,m,). 


4.4.3 
Continuum of Features - One-Dimensional m, for Several Channels 


This case has an important practical meaning because it is often necessary to recog- 
nize a multi-channel signal, for example, at the recognition of EEG when the simulta- 
neous analysis of several disposals is performed, Fig 4.6. 

One can use the expression (4.2) for the neural network implementation in this case. 
The neuron structure is shown in Fig. 4.7, and the neuron layer represents a parallel 
neuron connection with equal inputs (Fig. 4.8). 


a A x(mp, n) 


0! ty 


Fig. 4.6. Types of signals received by the layer of neurons in the case of a feature continuum and one- 
dimensional multi-channel m,: a - bump signal; b - periodic signal 
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Fig. 4.8. Structure of a multi-channel neuron layer with discrete set of neurons in the case of a feature 
continuum and one-dimensional feature space 


4.4.4 
Feature Continuum - Two-Dimensional m, 


The problem of image recognition is adequate for the variant of a discrete set of neu- 
rons in the layer and two-dimensional m,. Taking into account (4.2), the neuron layer 
structure in the case of a feature continuum and two-dimensional m, can be repre- 
sented in the form shown in Fig. 4.9. The system of the image’s multiplication can be 
realized on the basis of optical fibers or with the help of holographic methods or a 
system of mirrors. 


4.4.5 
Neuron Layer with a Continuum of Output Values 


In this case, the output space remains at the bottom to be a space of table features 
varying in some interval of indiscrete set of values determined by activation function, 
i.e., by nonlinear transformation at the neuron output. This also concerns systems (4.2), 
Figs. 4.1, 4.9. 

For systems (4.4), the output electric signal has the form of a function of one vari- 
able taking two values (1,-1) in some interval of its variance. The transfer to the neu- 
rons with a continuum of solutions results in the output electric signal varying indis- 
creetly by its amplitude. 


4.4 - Classification of Continuum Neuron Layer Models 
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Fig. 4.9. Block-scheme of a physical implementation of a neuron layer with a feature continuum, two- 
dimensional m, and discrete set of elements in the layer 


A multilayer neural network for the solution of different problems concerning vectors, 
signals and pattern processing can be designed as different combinations of the afore- 
mentioned continuum systems. The following problems are perspective in this field: 


" Search for physical implementation methods for different types of continuum models 
of multilayer neural networks; 

= Development of structure synthesis methods for continuum models of multilayer 
neural networks according to the different criteria; 

= Development of other a priori given structures of continuum models of a multilayer 
neural network (cross connections and backward connections systems, etc.); 

= Use of a system’s “human-machine” for the continuum multilayer network struc- 
ture synthesis; 

= Introduction of new continuum properties of multilayer neural networks (continuum 
number of layers, etc.) 


The transfer to the continuum feature space and continuum set of neurons in the 
layer becomes important when the number of features becomes large (several hun- 
dreds or thousands). 

The problem of transfer to the continuum number of layers is not so significant. 
The preliminary analysis of this problem shows principle mathematical difficulties for 
the implementation of such a transfer. This follows from the expression for the output 
signal of the three-layer system: 


M3 M2 M 
¥3= sign) Orn, sign) | 4m, sign yin 
m3 m2 my 


The solution of the problem for the transfer to the continuum of the number of 
layers is complicated due to the nonlinear transducers at the output of each layer and 
due to the difficulty of selecting a method of physical implementation of the open-loop 
system adequate for such a model. 
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Chapter 5 


Investigation of Neural Network Input 
Signal Characteristics 


5.1 
Problem Statement 


A neural network can be represented in the form of an equivalent system that adapts 
to external conditions. A general block-scheme of such a system is shown in Fig. 5.1, 
where x(7) is a multidimensional stochastic process having the form of pattern se- 
quence at the neural network input; 1 is a discrete argument. 

Signal e(7) is determined by supervisor instructions belonging to the current pat- 
tern at the neural network input to a particular class. Each class includes some set of 
patterns possessing a common property. A multidimensional output signal of the rec- 
ognition system y(n) is generated in the form of neural network data belonging to the 
current pattern of a particular area in the solution space. The three spaces considered, 
X, E, Y, are respectively spaces of patterns, supervisor instructions and neural network 
output signals. The unit of neural network parameter adjustment determines vector a(7) 
of adjustable coefficients and information about y(n) transformation structure, ie., 
dependence of the neural network output signal on the input one; g(7) is a vector of 
intermediate neural network signals. 

The input signal of the neural network is [x(), €(n)]. One of its characteristics is 
the number of €(7)-signal level gradations determined by the number of pattern classes. 
Signal x(n) of dimensionality N can be in general discrete or indiscrete by its ampli- 
tude. If €(1) is a one-dimensional signal with its level discretized into two or K grada- 
tions, then respectively 2 or K pattern classes are considered. If vector e() has a 
dimensionality N* and the number of amplitude gradations for each of its components 
amounts to Ky then the number of classes is 


K= (Ky)” 


Fig. 5.1. 


Block-scheme of th | x(n) y(n) 
pce ilk —$‘ > Neural Network 


a(n) 


E(n) Adaptation Block 
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Fig. 5.2. A x;(to) 
The formation of a feature 
space and supervisor instruc- 
tions in the problem of device 
reliability forecasting 


A class continuum is considered when signal €() is indiscrete. Then the neural 
network adjustment can be considered as the problem of system estimation of some 
indiscrete parameter € for the distribution f(x,€) of some stochastic process. 

A particular problem of adjustment (learning) can be illustrated in the follow- 
ing example. Let us consider the formation of the neural network input signal in 
the case of a feature continuum in the problem of reliability forecasting of some 
device. 

In Fig. 5.2, x;(to) are the curves for the time changes of some device parameter that 
serves as an indicator of the device reliability in the test; j is the number of the testing 
device; x, is the parameter’s acceptable value. 

The point of the curve x;(f)) intersection with the level x) determines the device 
non-failure operating time. Vector x;(m) corresponding to each curve is the vector ob- 
tained by the curve discretization in time in the interval [0, Ty], where Ty is the time 
of the device testing. Components of the vector correspond to the ordinates x;(to) in 
the points of discretization. This procedure allows one to form the feature space in 
this problem. 

The supervisor instruction can be formed in the following way. The device opera- 
tion life time Ty is given a priori. Vectors x;(1) determined by the points of inter- 
section of the curves x;(f)) and x, that lie before Ty belong to the first (failure) 
class of devices, and after Tj - to the second (fitting) class. Respectively, two ampli- 
tude gradations of the signal e(m) are introduced (1,-1). The f) axis can be parti- 
tioned into K intervals with an indication of the device type, and the signal e(7) will 
have K amplitude gradations (for example, ¢=1,...,K). Each vector x;(n) will have 
its own value é. 

In the extreme case, when the f) axis is not partitioned, the supervisor instruction 
for the system of the device’s non-failure operating time has indiscrete distribution. 

The following characteristics can be the object of analysis in each particular case: 
joint distribution f(x,€), conditional distribution f’(x/e) of pattern x assemblage un- 
der a given instruction € about its belonging to the k-th class, distribution of 
signals f(x) and f,(e) and their different moments of distribution, etc. 

The introduction of such a notion as teacher (supervisor) qualification allows one 
to develop a unified approach to the problems of learning and self-learning. Joint 
distribution of input patterns and instruction signal of patterns belonging to some 
class is represented on the basis of a unified approach. 


5.2 - Joint Probability Distribution of the Input Signal for Two Pattern Classes 
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532 
Joint Probability Distribution of the Input Signal for Two Pattern Classes 


In the neural network learning process, learning sample patterns belonging to a par- 
ticular class is known with probability 1, i.e., the teacher (supervisor) gives the precise 
instruction concerning the learning sample. In the self-learning process, the signals in 
the learning sample are not accompanied by the instruction about their belonging, 
and the probability is less than 1. In the simplest case of two-mode distribution, this 
probability is 0.5. Let us designate as a, the probability of the supervisor instruction 
for the pattern to belong to some class. 

It is important to analyze intermediate modes of the transfer from the problem of 
learning to the problem of self-learning and vice versa in the algorithm block-scheme 
represented in Fig. 5.3. 

The transfer must be performed by the change of the learning sample member 
probability of belonging to a particular class in the range from a=1 to a=0.5 (and 
vice versa). 

The following reasons make it necessary to analyze such intermediate modes: 


1. Development of a unified approach to the analysis and synthesis of learning and 
self-learning modes for pattern recognition systems; 

2. Solution of some practical problems. One such problem is the learning procedure 
with the teacher of incomplete qualification. 


The expression for the joint distribution f(x,€) of signals x(n) and €(n) is 


_ | pi —a) f(x) + poafa(x) »  €=1, 
PO) 5 es pl wea)». =a Ga) 


where p, and p, are the a priori probabilities of the emergence of the first and second 
classes; f,(x) and f,(x) are distributions of signals x,(n) and x,(m) representing pat- 
terns of the first and second classes. 

Distribution (5.1) is a discrete-continuous one due to the discrete form of €(n). 
However, it can be written in the indiscrete form using the Kronecker delta function. 
The discrete representation is taken into account below by means of the replacement 
of the integration operations with summation operations. 

The level of teacher qualification b is introduced in the following way [5-1, 5-2]: 


b=2a-1 (5.2) 


Unit for change of proba- Unit for putting divisional | Parameters 


bilities for belonging of sample surface according to some 
member to this or that class criterion 


1° class sample 
——> 


of divisional 
surface 


nd 
2™ class sample 


Fig. 5.3. Structure of mathematical model for the unified approach to the problems of learning and 
self-learning in the pattern recognition systems 
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Consequently, b = 1 when a = 1, and the teacher qualification is complete; b = 0 when 
a=0.5, and the teacher qualification is zero. 
Taking into account (5.2) and (5.1), one obtains the following: 


p=) es ple en 2a 
fee (5.3) 
P Oe”) f(x) | Pa) fe a 


Figure 5.3 represents a structure of a mathematical model for the generation of a neu- 
ral network input in the considered case. Expressions (5.3) and (5.2) relate to the equal 
teacher qualification level for both sample classes. Equation (5.3) at b=1 gives the 
following joint distribution law for the input signal in the neural network learning mode: 


(x), if e=1 
f(me)= i: ha (5.4) 
pA), if e=-1 
The joint distribution law for the input signal in the self-learning mode at b = 0 has 
the form 
Ph fa) +22 fox) , if e=1 
fx 6)= > 
PR@)+P foe), if e=-1 


Here the signal of the supervisor instruction e() gives no information about the 
pattern belonging to any class because the conditional probabilities f’(x/e = 1) and 
f’(x/e=-1) are equal. 

Expression (5.3) at b =-1 gives 


a et a een 


The teacher in this case performs an incorrect classification intentionally (the teacher 
is a “saboteur”). 
By definition of conditional probability 


(5.5) 
where 


+00 
fele)= J flme)dx 


5.2 - Joint Probability Distribution of the Input Signal for Two Pattern Classes 


Integration of (5.3) gives the following supervisor instruction distribution function: 


1 b . 
ae ee) » if e=1 

f(aj= 14 (5.6) 
7 ty \Pi Pa) » if e=-l 


Equations (5.6), (5.3) and (5.5) give 


pid—b) f(x) + ppd +b) f(x) f 


1—b( ) ~ 

—5( Di — pa 

i 5.7 

PTO). aie a+p,1-BA® if e=-1 7 
1—b(p, — pr) 


The conditional distribution law f”(¢/x) is similarly defined as 


f(x, €) 
f(x) 


+00 9 
fx)= f flwede= D> py fy) 
=O k’=1 


f"(xley= 


where the integration with discrete argument € is replaced by summation. One obtains 
after substitution and integrating 


1—b 1+b 
Pi ilw+ Pr . folx) 


» if e€=1 
f"Elx)= PAC) + pofr(x) (5.8) 
1+b 1—b 
Pi fi (x)+ po fy (x) 
2 2 » if e=-l 
Pifi(X) + prfo(x) 


It follows from (5.7) and (5.8) at b=0: 


f (xle) = f(x), f"(elx) = fe) 


indicating statistical independence of signals x(m) and e(n) at the neural network in- 
put in the self-learning mode. 

Let us designate the joint moment of the j-th order of multidimensional stochastic 
process x() as O%;: 


j= ff tip Fe (tw) dap ody ; ij,...ij =1,....N 
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Then the expression for the moments of distribution (5.3) has the form 


ah f= 1+b ; 1+b ee 
e'x) =i gt 22 5 aj, +( 1) p 2 a, +( 1)' pa 2 By 


where a;, and aj, are the joint moments of the j-th order for the assemblage of patterns 
of the first and second classes. One obtains in the cases of even-numbered and odd- 


numbered i 


eixd = Prin + Ph ex! "= 0(pra, + pa; i) 


Consequently, the teacher qualification influences the moments of distribution f(x,€) 
at odd-numbered i. 


Unequal teacher qualification with respect to the patterns of the first and second classes. In 
some practical problems, the teacher qualification of the pattern recognition system is 
different for the patterns of the first and second classes. Let us introduce a stochastic matrix 


a, a a4 1-a 
qa a) 2 2 
41 49] [l-a a) 
where aj; is the probability of the supervisor instruction to consider patterns of the j-th 


class as patterns of the i-th class. In this case, 


Pi(l—a) fix) + pra fo(x) » if e=1 
fx,é)= : 
pia fi(x)+pr(l-a)) f(x) , if e=-1 


or 


Pi(1-b) 
2 
Pi aa 


pal +P) 


fx)4 fox) , if e=1 


_ 5.9 
nl: ba) ecw) , if e=-1 ite 


f(x 6)= 


fi(x)4 


The analysis of different variants for the relationship between b, and b, is an object 
of special discussion. For example, when b, = 1 and b,=0, the teacher qualification is 
equal to 1 for the first class, and is equal to 0 for the second class. In this case, 


1 P 
5 Poh (x) > if E€=1 


f(x%e)= 1 
Prfi G+ Poh) , if e=-1 


This variant is intermediate between learning and self-learning modes. 
Moments of distribution for (5.9) have the form 


5.2 - Joint Probability Distribution of the Input Signal for Two Pattern Classes 
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7 
éx! =p Q@j.+ pi; for even-numbered i 


—n 
ex) = by pr0j. + b\p\aj, for odd-numbered i 


Joint distribution law in the case of “teacher’s slant about his abilities”. The teacher 
makes some number of mistakes in the pattern recognition system learning. Let us 
introduce the notion “teacher’s slant about his qualification” by some coefficient c. At 
c > b we have “teacher’s overestimation” of his abilities c - b. At c < b we have “teacher’s 
underestimation” of his abilities b - c. The question concerning the investigation of the 
influence of teacher’s slant and his real qualification upon the neural network opera- 
tion characteristics arises. 

Similar problems can be also formulated in the sense of “sabotage” when 
-1<b<0. 

Let us designate the supervisor instruction in the case of complete teacher overes- 
timation as €’. It was assumed in the previous text that € = €’. The uncertainty of the 
teacher in his abilities results in the fact that the classification of the appeared pattern 
as a pattern of the first (e = -1) or second (¢€ = 1) class is performed with the probabil- 
ity (1 + c)/2. Respectively, the same patterns belonging to the second or first classes are 
determined with the probability (1 - c)/2. The joint distribution of random values € 
and e’ can be represented in the form 


— if e/=1 
fle é) = 2 (5.10) 
l+c : 
Pi > if é€=-1 
fs if e/=-1 
—C¢ 
— , if e=1 
Pi 5 
Consequently, 
1-— 1 
e\= 
fele) <a e 
Pi 5 P2 ae 


Replacing € with e’ in the joint distribution (5.3), we obtain the following distribu- 
tion: 


1—bc 1+bc 
— fie)+ Pa 


1+b 1—b ‘ 
Pr fi@)+ Pr fe) » if e=—1 


Pi fox), if e=1 


f(xe)= (5.11) 
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The derivation of (5.11) is done in the work [5-2] where the analysis of particular 
and general cases is performed. It is shown that in the case of different “teacher’s slant 
about his abilities” concerning the first and second pattern classes, 


H(c2 C1 )—b (4) +e) ape) eo late) eS 


4 4 
+ (C2 —€ )—by (cy +e) (€,—¢,)—b (qq +e) 


2+ 
, if €=—-1 
A Pofa(x) ri 1 


24 
Dif (x) 

f(xe= 53 
PA 


The expressions for eigendistribution, conditional distributions and distribution 
moments for the input signal can be obtained from the latter equation. 


5.3 
Joint Distribution Law for the Input Signal Probabilities 
in the Case of K Classes of Patterns 


The matrix of probability a,, that the teacher will consider a particular pattern as 
belonging to the k-th and k’-th class is introduced a priori when the number of classes 
is more than two: 


Ay AR AK 
A=|a Ag ORK 
aK AxK! ARK 


It is evident that 

K 

Sagal, bh =H1,...K 

k=1 

The joint probability distribution law for signals x(m) and €(n) has the form 


K 
fRO= >> pra fye(®) » if e=k (5.12) 
k’=1 


where k = 1,...,K. 
In the learning mode, matrix A is a unit matrix 


10...0 
it Fe 


In the self-learning mode, the probability of considering the k’-class pattern as 
belonging to the k-th class is equal to 1/K for all classes: 


5.3 + Joint Distribution Law for the Input Signal Probabilities in the Case of K Classes of Patterns 
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fd 
alt 
ae 
aa 


In the “sabotage” mode, k’-class patterns are considered with some probability of 
belonging to any class except this k’-class: 


AK aK2 ::- 0 


Let us introduce the notion of the teacher qualification b, for the pattern recogni- 
tion system in the case of K classes. The relationship between probabilities a,,, and b, 
is nonlinear because 


1 
b. =310 , if ay.=— 5.13 
k kk = (5.13) 
-1 > if akk = 0 
If this relationship were approximated by the function of the second order 
ayy. = xbk + yb +z 
then after the substitution of (5.13) one obtains 
ttl». 4 1 
a by +—b, 4 5.14 
al i (5.14) 
Similarly, the relationship b(a) has the form 
K(K-2 1-K* 
bi = ( ) ay ay —1 (5.15) 


Any of the expressions of (5.14) or (5.15) can be used in concrete calculations. The 
expression (5.12) allows one to obtain the final expression for the distribution mo- 
ment: 


n 


Ex) =D 1D. Pann ajuk 
k=1k'=1 
The joint probability distribution for the input signal in the case of K pattern classes, 


arbitrary teacher qualification and “teacher’s slant about its abilities” with respect to 
each class has the form 
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K K 
fRo=) a) aepeie®) + 2 =), Tala k, 
k=1 = k/=1 


where the probability matrix C = [c,,] characterizes the “teacher’s slant about its abili- 
ties” when considering [-class patterns as k-class ones. If the following is designated 


K 
onan = die D>... = dix 


k=1 


then one obtains 


K 
fee=> pei > Uo esl, l=lnak. 
k’=1 


The case of a pattern class continuum. The eigendistribution of supervisor instructions 
in the learning mode for recognition of K classes has the form 


f (€) = Pp» when € =k, k=1,...,K 


This is a function of discrete argument €. The case of indiscrete distribution func- 
tion has a wide practical use when the neural network teacher cannot clearly deter- 
mine the patterns belonging to a particular class. It is possible (but not desirable due 
to the loss of information) to partition the axis T into K parts and to reduce the prob- 
lem with a class continuum to the problem with K pattern classes. In the case of con- 
tinuum pattern classes, unit matrix A and continuous function f,(€) in the learning 
mode, one obtains similar to (5.12) the following: 


F(%€) = f (ef (x/e) 


and in the self-learning mode, 


f(%€) = flOfC) 


A function a(e’,e) of the probability of considering patterns objectively correspond- 
ing to the distribution f(x,e’) as corresponding to the distribution f(x,€) is introduced 
in the case of the arbitrary teacher qualification. Then 


oe) 


f a(e’,e)de’=1 


—oo 


The joint probability distribution f(x,¢) of the neural network input signals x(n) 
and €(n) will have the form 


f(xe)= f a(e’,€) f(x,e’)de’ 


—0oo 


For the learning mode, a(e’,e) = d(e’-€) and f(x,€) = f(x,€). 


Literature 
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About nonstationary neural network input signals. Assemblages of patterns distributed 
inside each class according to the time dependent law f,(x,n) are considered in the case 
of nonstationary input signals. The change of distribution f(x,€) with time n can take 
place due to the change of conditional densities f’(x/e) or distribution of supervisor 
instructions f,(€). The general expression for the joint distribution law of the neural 
network input signal has the form 


K 
f(xen=S dy pefeon), if e=1, 1=1,...K 
k'=1 


In principle, one can also consider the more general case with time-dependent teacher 
qualification and “teacher’s slant about its abilities”. The expression for the moments 
of such a distribution at the current time has the form 


—n K K ; 
ex) = »s » yt. (1)OC;p (nk 


k=1k'=1 


We describe in this chapter the analysis of distribution functions for the neural 
network input signal in the case of arbitrary teacher qualification. In particular, we 
consider learning, self-learning and “sabotage” as well as some intermediate cases. 
Generally, the teacher can instruct the neural network in the form of a multidimen- 
sional vector €(7) with dimensionality N. The formal expressions for the input signal 
distribution function in the majority of cases are the same. The expressions for the 
input signal distribution laws can be written in the general form relatively a priori 
probability of class appearance and conditional distributions f’(x/e). 
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Chapter 6 


Design of Neural Network Optimal Models 


6.1 
General Structure of the Optimal Model 


The optimal neural network model is such optimal transformation of the input signal 
[x(1), €(1)] and acquisition of the output signal y(1) according to the selected primary 
optimization criterion (Fig. 6.1). 

The upper unit in Fig. 6.1 represents a controlled system, and the lower unit repre- 
sents an optimal model that adjusts the controlled system. The general approach for 
the design of the optimal models of a pattern recognition system in the learning mode 
consists in the learning arbitrary characteristics of the input signal that represents two, 
Kand continuum classes of patterns. The number of the neural network solutions is 
also arbitrary. The solution space has two, K and a continuum of gradations. The super- 
visor instructions and solution characteristics are selected a priori and independently. 

The design of the optimal model is performed according to the selected criterion of 
primary optimization. The model description is performed in the form of a divisional 
surface. The divisional surface partitions the multidimensional feature space into non- 
overlapping areas with the instruction of the corresponding area belonging to some class. 

Table 6.1 represents classification of neural networks according to the input signal charac- 
teristics and solution space for a particular form of one-dimensional signals y(1) and €(n). 


System of the first type with two pattern classes with binary output is most widely used. 


System of the second type with K pattern classes and with number of solutions equal 
to K. The investigation of such a system is given in [6-1 to 6-3] for different criteria of 
primary optimization and different a priori information about input signal character- 
istics. The idea that the problem of learning during recognition of K pattern classes 
can be reduced to the sequential step-by-step use of a learning algorithm for two classes 


Fig. 6.1. x 
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Table 6.1. Classification of neural networks 


Solution space (number) Input signal 


Two classes Kclasses Continuum of classes 
Two i 7 8 
K, Gee 3a K<K, 9 10 
K=k, 2 
K, = const. 3b K>k, 4 
Continuum 5 6 11 


is not correct. This is a separate problem whose optimal solution requires the genera- 
tion of an equivalent divisional surface starting from the first step. 

The neural network solution space is characterized by the number of amplitude quan- 
tization levels for the output signal y(n) in each channel. The neural networks with a 
continuum of solutions (5 or 6, Table 6.1) have an indiscrete output signal. The neural 
network of 8, 10 or 11 type has an indiscrete distribution of supervisor instructions. 


6.2 
Analytical Representation of Divisional Surfaces 
in Typical Neural Networks 


Methods of optimal neural network model designs for K pattern class recognition are 
given in [6-1 to 6-3]. The expressions for optimal divisional surfaces can be obtained 
from the expression of a minimized functional of primary optimization and a solution 
of the minimization problem for this functional with the existent limitations. 

The optimal neural network model is determined by the system of inequalities for 
the initial space partitioning into K areas. Let us consider the design of the optimal 
neural network models shown in Table 6.1. 


Neural network of the third type. The system of pattern recognition optimal by the 
maximum posterior probability (in the case of two solutions) transforms the input 
signal x(n) into the output signal y(m) according to the following expression 


1, if f(€=1/x)> f(e=—1/x) 
y(n)= ; 
-1, if f(e=—1/x)> f(e=1/x) 


The divisional surface is put through those points x that have equal posterior prob- 
abilities of belonging to the first and second classes. The multidimensional space area 
with higher posterior probability of belonging to the first class is taken as the area of 
the first class. However, points belonging to some class in a multidimensional feature 
space must be indicated with more definite probability in the majority of practical 
tasks. For the system of the first type (one divisional surface), this probability decreases 
as the points approach the divisional surface and becomes zero at the surface. 


6.2 - Analytical Representation of Divisional Surfaces in Typical Neural Networks 
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Fig. 6.2. A xi 
The partition of the feature 
space by two divisional sur- 
faces 


xj 
a 


0 


The system of pattern recognition of the 3a-type has two divisional surfaces. They 
partition the feature space into three parts (I, II, III, Fig. 6.2) with inactive regions in 
which the system determines the input pattern: region I - to the first class, region II - 
belonging to the second class, and region II] - when the current pattern cannot be 
classified. 

The multidimensional feature space must be divided into three regions: 


= Region I corresponds to the neural network decision about its belonging to the first 
class 


f"(€=—-1/x)—d, > f"(e =1/x) 


= Region II corresponds to the neural network decision about its belonging to the 
second class 


f"(€=—-1/x) +d, < f"(€=1/x) 


= Region III corresponds to the case when the neural network cannot make a decision 
about the current pattern belonging to the first or second classes 


f'(€=—-1/x)—-d, < f"(€=1/x) 
f'(€=—-1/x)+d, > f"(e=1/x) 


Here parameters d, and d, (0<d,<1,0<d,<1) determine the probability level 
of pattern consideration belonging to the first or second classes. It is possible, in 
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particular, that d;=d,=d or d,=d,=d=0. In the latter case, two divisional sur- 
faces are reduced to one surface. For two divisional surfaces, the pattern recognition 
system with optimal surface parameters transforms the input signal x(m) into the 
output signal y(7) in the following way: in the region I y(m) = -1 (first class); in the 
region II y(n) = 1 (second class); in the region HI y(n) = 0 (first and second classes). 


The general expression for the divisional surfaces optimal by the posterior prob- 
ability value has the form 


Pifi(x) a P2fr(x) 
PiAi(X) + po fo(x) Pif\(x) + pofo(x) 

Pifi) Pee Paf2(x) 
Pifi(X) + pr fo(x) Dif (X) + po fo(x) 


One obtains after transformations 


S'(x) = 2&)_ (1-4) 1 
f(x) (1+) po ex) 
8" )— fale) (1+) p, 
f(x) (1+) py 


This is the final expression for the divisional surface optimum by the posterior 
probability value as the primary optimization criterion. A more detailed interpretation 
is given in [6-1, 6-2]. 

The pattern recognition system optimum by minimum average risk function criterion 
partitions the multidimensional feature space into three regions: first and second classes 
of patterns belonging, and the region where the neural network cannot make a decision: 


S'(x) <0 

S"(x)>0 (6.2) 
S"(x)<0<S'(x) 

Conditional risk function is a sum of losses due to consideration of the i-th class 


patterns as patterns of j-th class. The losses are calculated as corresponding probabili- 
ties multiplied by coefficients J; (i= 1, 2; j = 1, 0, 2) of the loss matrix L 


thi Go be 


hy ho hy 


Coefficients 1,, and l,, are the loss coefficients when the recognition system cannot 
make a decision. It is evident that 


Li <byo< has by > o> ly 
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The expression for the conditional risk function has the following form: 


oe a XN 
n= J. fhifi@dx+ JJ hofi(xdx+ ff ha fix)dx (6.3) 
S'(x)<0 S""(x)<0<S'(x) S"(x)>0 
— roe eee 2, 
n=. fbiflxdx+ ff bof(xdx+ ff afr(x)dx (6.4) 
S'(x)<0 5"(x)<0<S'(x) S"(x)>0 


After averaging of conditional risk functions, one obtains average risk functions 


N N 


aria SS SS 
R= ff (hpi) +hiprf(s))dx + Toff opi fi(x) + Loo Pa fa(x)|dx 
S'(x)<0 S"(x)<0<S'(x) 


+ f e uh haPifi®) + ba po fr(x)|dx 


S"(x)>0 


Taking into account that 


cco) 


S'"(x)<0<S'(x)  S"(x)<0—S’(x)<0 


PTT S-Ed 


S"(x)>0 S""(x)<0 


i ies J h2PLfiG)+ bopofr@)]dx=hop, +hepr 


the expression for the average risk function takes the form 


N 


SS 
R=(hoPithopr)+ ff [aipifi) + lrPrfe®) — hori fis) —hoprfr(s)|dx 
S'(x)<0 
N 


+f... [ hoPiti) + bopa fa) —hapifilx) —lnaPafr(x)]dx 


S"(x)<0 
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Consequently, the final expression for the average risk function is 


N 


=——S 
R=(hapi+lapo)+ f.. [hi-ho pif) + ha —ho Paha) fr] ax 
a (6.5) 
SS 
+f. [[Go —ha) pr fil) + (Lo loa) pa fa) | dx 
S"(x)<0 


It is necessary to derivate the expressions for S’(x) and S”(x) that provide the mini- 
mum of R. It is relatively simple to show that this minimum is achieved when the ex- 
pressions under the integral sign are negative in the corresponding region and are 
positive outside it. Consequently, the minimum of R is achieved when 


S'(x)=(h1 —ho) Prfi(®) + (Lar — Lo) Pa fa(®) 
S"(x)=(ho —h2) PLAC) + (lo — 2) Pofa(x) 


The expressions (6.2) and (6.6) determine the optimal neural network model for 
the recognition of two pattern classes with two divisional surfaces. Let us assume 


(6.6) 


hy = b= 0, by = by = 1 No = ho = ly Pi = Po 


and 
S'(x)=(1-lp) A(x) —h f(x) 
S"(x)=hfx(x)—(1- by) ill) 


Figure 6.3 illustrates the changes of thresholds h, and h, depending on |). 
The analysis of expressions for divisional surfaces provides the following conclu- 
sions: 


a If 1)=0 then the region of the neural network where it cannot make a decision 
occupies all of the feature space because the losses are zero; 

b If 1,=0.5 then the neural network with two divisional surfaces is reduced to the 
neural network with one divisional surface. In this case, the losses without recogni- 
tion are two times fewer than with wrong recognition. The losses with correct rec- 
ognition are zero; 

c If0.5>1)>0 then an inactive region exists where the neural network does not con- 
sider the current patterns as belonging to any class; 

d If 1>/,)>0.5 then the neural network realizes two divisional surfaces. The curves 
in Fig. 6.3 for the change of thresholds are symmetrical both relative to the line 
f, Cx) = f(x) as well as relative to the level J) = 0.5. The threshold h, determines the 
surface S’(x) and the threshold h, determines the surface S”(x); 

e IfJ)=1 then all the multidimensional feature space is considered to belong both to 
the first and second classes. 
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The comparison between optimal models designed by a posterior probability cri- 
terion (6.1) and minimum risk function criterion (6.5) shows that in the case when 
ee (hy +L) —(ho + ho) a (ho th2)—(ho +40) 
_— 2 —— 2 


the optimal models coincide. This gives the additional interpretation of coefficients d, 


and d,. Both criteria can be used when the a priori information about dor I; is known. 


The analysis of expression for the average risk function shows the possibility of 
considering the primary optimization criteria under the following limitations: 


1. The equality of the separate average components of the risk function 
Pin = Pot (6.7) 
2. Constant component of the average risk function 
Pot, = = const. (6.8) 
Let us write the optimization Lagrange functional for the first limitation in the form 
T=R+A(pyr,- Potr) 


One obtains from (6.7), (6.3) and (6.4) 
N N 


—_—, —_— 
Prha +f fGihodpiicodxt ff (ho ha) pifi@ex 
S'(x)<0 S"(x)<0 
N N (6.9) 


—_— — 
= Ploy +f [r-bopafedxt ff (oo —h2)prfrerdx 


S'(x)<0 S""(x)<0 
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The equations for the optimal divisional surfaces 


(6.10) 


S'(x)=(h1 —ho) pifi(®)0.+ 4) + (ha — ho) Prfa(x)(— A) 
S"(x)=(ho —h2) prfi)0+ A) + (Lo —h2) Profi @)— A) 


are the result of functional J minimization. The value A is obtained from (6.9) and (6.10) 
for corresponding limitation. 


In the case of limitation (6.8), the average risk function minimization criterion gives 
the following form of the optimal divisional surfaces 


S'(x)= (hi-ho) Pie) + Alay oo (6.11) 


S"(x)=(ho —h2) pif) +A (lo — ha) Profr(x) 


The expression for A is obtained from (6.11) and corresponding limitation 


a potas 
P2tr = Paton +f fr —lp)prfrlwiax+ [ao —ha)P2fr(x)dx =a 
S'(x)<0 S'(x)<0 


Neural network of the 3b type. Let us consider the neural network of the 3b type 
(Table 6.1) for two patterns with (K, — 1) divisional surfaces. K, = const. means that the 
integer number of the solution amounts to 4 or higher. 
Let us consider the neural network optimal model by the posterior probability cri- 
terion. The multidimensional space partition is determined in the following way: 
The region k, (k, = 1,...,K,) is determined by the following system of inequalities: 


fle= —UX)— dk, 1 kp < fl(e= 1/x) = fle= UX) dk ky H 
under conditions dp, = 1, dxp, xp+1 = —1, and 


dy kyt>0» if fle=—1/x)> f(e=1/x) 


Ay kyti<0 » if fle=—-1/x)< f(e=l1/x) 


Figure 6.4 shows the example of such partitioning in a one-dimensional case. The 
neural network output signal must have K, level gradations, i.e., at two pattern classes, 
the neural network makes K, decisions. The decisions are made with some margins by 
the posterior probability, as it is seen from Fig. 6.4. 

Using the known expressions for posterior probabilities f(¢ = -1/x) and f(¢= 1/x), 
one can obtain the expression for the k,-th solution region in the initial feature space: 


Idi thy py fr(x) 1 —Fep skp +t 
1+dk, kp Pifi(x) 1+ dk, kp +1 
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fl€ = 1x), fle = 11x) 


Okp-3,kp-2 f(€ = 11x) 


Regions 


Fig. 6.4. The consideration of the posterior probability primary optimization criterion in the case of 
two pattern classes and (K, - 1) divisional surfaces 


Let us determine the optimal neural network model by the minimum average risk 
function criterion. After the learning procedure, the neural network partitions the 
multidimensional feature space into K, regions with a priori losses in each of them. 
The matrix of loss coefficients has the form 


L (6.12) 


dir be -- bk, 


where |, (i= 1, 2; k= TD szsesy K,) are the loss coefficients for the consideration of the 
i-th pattern class as belonging to the k,-th region. It is necessary that 


hy <p <.-<hk, 5 hi >ha >> hx, 


The expressions for conditional risk functions have the following forms 


kp —_ 
n=) ff ky idx 


kp=l 
po skp (x)>0 


| oe ieee 
=>) fod lok falx)dx 


kp=l 
po skp (x)>0 
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Here S*p(x) > 0 is the region of the multidimensional feature space corresponding 
to the k,-th decision. Consequently, the expression for the average risk function is 


a. an 
R= SO ff [bk PihGo+ ry Po frl)|dx 
kp =l kp 
SF (x)>0 
Let us find the expression for S'p(x) that minimizes the average risk function. With 
the following additional notion 


8kp (x)= hyp Pufi(®) a big P2fi) 


one obtains 


It can be shown using the given expressions above that minimum R is achieved at 


k 
sl P(x) = gyp(X)— guy (X)>0 j kp =h..»Kp 


or 


k 
s P) x) =I iC) + ay Pa fel X)— hy Pufi(®) la Pafa(®) <0 » KO=1..4Kp 


The expressions for other aforementioned optimization criteria can be derived simi- 
larly. 


Neural networks of types 4 and 9. This is the case of the neural network with K pattern 
class recognition and with (K, — 1) divisional surfaces. 

The system of inequalities in the case of minimum average risk function in the region 
of k,-th decision has the following form: 


K 


k 
sl M=5" (le I Pu fel x) <0 > kp =1,..oKp 
k=1 


Neural network of type 5. The neural network has an indiscrete amplitude output signal 
with two input pattern classes. However, the input and output signals are discrete in time. 

One must have an a priori function d(y) for the probability to exceed in the case of 
the use of a posterior probability primary optimization criterion. The equation for the 
neural network optimal model with two pattern classes and a continuum of solutions 
has the form 


f"(€=—-1Ix)—d(y)= fe =1/x) 
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Fig. 6.5. The change of error function during transfer from two neural network solutions to K, or con- 
tinuum number of solutions with two pattern recognition classes: a two solutions; b three solutions, 
c K, solutions; d continuum of solutions 


Consequently, 
Pifi(x) A= Paf2(x) 
Pf) + pofa(x) Pf) + prfr(x) 


(6.13) 
Pifi(®)[1—-d(y)|—-[1+ d(y)| pr falx) = 0 


This is a final expression for the neural network optimal model in this case. It de- 
termines the relationship between the input and output neural network signals. 

Let us consider the minimum average risk function criterion. Figure 6.5 illustrates this vari- 
ant. One must introduce a vector-function of errors instead of a matrix of coefficients (6.12): 


Ge i] 


L(y) 
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These errors take place during making decisions y to consider different patterns as 
belonging to the first and second classes. 

In the case of K, neural network solutions, the expression for the conditional risk 
function for the first pattern class has the form 


N 


Ry. es 
= » hkp Jf fi (x)dx 


kp=l 
P skp (x)>0 


and X is the full multidimensional feature space. Let us introduce additional notions: 


k 
1, if xeSP(x)>0 
G(x,kp)= ih ) 
0, if x¢s'P(x)>0 


Then the expression for the conditional risk function r, takes the following form: 


Kp = 
n= Do hiy J... Cok)a)dx 
kp=l x 


Similar to the previous cases, function G(x, ky) is an object of synthesis. It deter- 
mines the neural network optimal model, i.e., an optimal relationship between neural 
network input and output signals. 

The transfer to the continuum of solutions results in the following expressions for 
the risk function: 


N 


n= fur], [ots »flwexay 
4 x 


N 


ree! 
R=pintpan=f ff [PbO Ah + pala) f@ldxdy 
y x 


Introducing the notion 


(xy) =[Ph (f(x) + Poh) fa(x)] 


one obtains the final expression for the average risk function 
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Fig. 6.6. 

Form of function G(x,y) for 
the discrete set of solutions 
and two pattern classes 


Fig. 6.7. 

Form of function G(x,y) for 
continuum of solutions and 
two pattern classes 


a 
R= | [_. [Gl y)g(x, y)dxdy (6.14) 
Y xX 


Here function g(x,y) is represented in the general form. The synthesized function 
G(x,y) must be expressed through the function g(x,y) in such a way that R is mini- 
mized. Figure 6.6 shows this function for the one-dimensional case of the feature space 
and for a finite number kK, of the neural network solutions. In the continuum case, this 
function is reduced to G(x,y), Fig. 6.7. Its form is a strap with the unit height. The strap 
shape is an object of synthesis. 

Figure 6.8 shows a geometrical illustration of function G(x,y) in the simplest case. 
Consequently, the problem of the average risk function optimization is reduced to the 
minimization of the area of the strap G(x,y)9(xy). 
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Fig. 6.8. 

Form of function G(x,y) for 
continuum of solutions and 
two pattern classes 


In the expression for the average risk function (6.14), 


alls if y=p(x) 
6(akp)={, , if y=p(x) 


where function p(x) is an object of synthesis. It is a transformation of the neural net- 
work input signal. Consequently, one obtains for R(x,y) 


a 
R= ff g/x,P"(x)|dx 
xX 


where function g[x, p(x)] has the form 
ls p(x)]= prfiah |P”(x)]+ Po fol)l[P”(x)| 


Minimization of R is the problem of variational calculation. Its minimum is achieved 
at 


Og|x, p(x)] 


av 26 
OP (x) 


For the particular form of g[x, p(x)]: 


fix) le alpen fil ar 2, 

Psi P(x) P2Ji\x dP" (x) 

Or it can be written 

ane 0S) phils 2) uo) = (6.15) 
y=P (x) Y |y=P*(x) 


This equation determines the neural network optimal model for two pattern classes 
and the continuum of solutions. 
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Let us consider some particular cases. 


1. The error function has the form shown in Fig. 6.9. Thus 


SUD = AlenCy— Ye} = 2 Mlea5(PC~ ye) 
4 y=P*(x) a=1 P*(x) a= 

dl 4 A 

SO = Alea’ ¥a)) = Y Alaa8(PO®)— ye) 
y y=P*(x) al P*(x) a1 


The equation for the neural network optimal model is 


A A 
Prfi(x) > Alga &(P” (x) — ¥q)— Pafr(*) D> Algn5(P’ (x) — yg) =0 


a=1 a=l 


Here 6(y) is 6-function with known properties. 
2. Error functions for patterns of the first and second classes. These functions are the 
functions of the second order 1,(y) = (1+ y)"l, L(y) =(1- y)7l. 
Hence 


qy(y) aig yy, a0) _, 


Inserting these expressions into (6.15), one obtains the equation for the neu- 
ral network optimal model P*(x) with a continuum of solutions and quadratic 
losses: 


P*(x ) = Pafr(x)— Pf) 
Pofr(x) + pifi(x) 


Fig. 6.9. 

Error function dependences 
for the neural network with a 
continuum of solutions 
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Fig. 6.10. 
Illustration of the neural net- 
work optimal model with a 
solution continuum and qua- 
dratic losses 


Figure 6.10 is the illustration of the function y = P’(x) realized by the neural 
network. 
3. Error functions of the first order 1,(y) = (1+ y)L L(y) =(1-y)L. In this case, 
dl, (y)/dy = I, dl,(y)/dy = -l. Consequently, some difficulties in the neural network 
optimal model design arise. Let the error function have the form 


h(y=la ty)", b(y)=1a— yi 
where c= 1 and c=0 correspond to p.2 and p.3, respectively. Then 


d d 1 
| I+ y)°*!}=——10+ y)* 
i” ail y}=— +9) 


d 
dy 


d 1 
L(y)=—|la—y)*"} =——la- y)° 
=F [la-y] =a») 
A general expression for the neural network optimal model is 


DA )(+ y) — pr fo(x)—y) =0 


Or in another form, 


- [fos [pia 
[Pofrol +p. 


The case of p.2 is obtained at c= 1. When c > 0 then y = -1 if p,f\(x) > pof,(x), and 
y=1 if pi fix) < pof,(x). 


Consequently, the continuum solution space is reduced to the space of two solutions. 
It is seen that (6.13) and (6.15) coincide when 


1-d(y)= AW? 


dl,(y) 


1+d(y)= dy 
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These expressions allow one to introduce an additional physical interpretation of 
function 1 - d(y). 

Minimization of the average risk function (6.14) under limitations (6.7), (6.8) and 
condition 


~ a a= 
J Jf neue dxdy= ff gi[xP@ojdx=0 (6.16) 
Y x ¢ 


where 


81(% ¥) = PAC) — Profs @h(y) 


gives the following equation for the neural network optimal model: 


y=P(x) y=P(x) 


(1+ A)p, f, (x) ——— =0 (6.17) 


The Langrangian multiplier 2 is determined by (6.16), (6.17). 
The limitation in the form of a given average risk function component has the form 


N 


—— 
ff. [6 n[eiC@hOidxdy =a (6.18) 
Y 


x 
If one were to denote 
821% YA)= 82% y) + AP A@Oh(W=O+ Dp hCOh+ rAMhY) 
then the expression for the neural network optimal model is 


Og (x; y> A), — p(x) 
OP(x) 


or in the other form, 


dy 


y=P(x) y=P(x) 


(1+A)p, 1G aerwes =0 


The multiplier A is determined by (6.14), (6.18). 


Neural network of type 6. The pattern recognition system for K pattern classes and a 
continuum of solution under the average risk function minimum criterion has the 
following neural network optimal model: 
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di,.(y) —0 


y=P(x) 


K 
>> Pi fic) 
kal 


where P(x) = y is the neural network optimal model. 


Neural network of type 7. This is the recognition system for K pattern classes with two 
solutions. In the case of the average risk function minimum criterion, instead of matrix 
of loss coefficients that emerge at considering the i-th class patterns as belonging to 
the j-th class and having the form (p.1) 


hy ! 
tal 2 


hy hy 


one must introduce matrix 


hy ho 
hy ho 


Tha ke 


ky Ik2 


The latter matrix is the matrix of loss coefficients that emerge when considering the 
k-th class patterns (k= 1,...,K) as belonging to the regions of the multidimensional 
feature space corresponding to the first and second solutions. The expression for the 
conditional risk function is 


oe pe 
naa ff f@dx+he ff fcedx 
S(x)<0 S(x)>0 


The average risk function is the conditional risk function averaged across all the 
classes 


= = - N 
R= Spin = Sorts f. fsae+Yo pte ff sre 
k=1 S(x)<0 k=1 S(x)>0 
Taking into account that 


im sf AO iJ Feet | fklxdx 


S(x)>0 S(x)<0 


6.2 - Analytical Representation of Divisional Surfaces in Typical Neural Networks 


107 


the final expression for the average risk function is 


K = 
R=) Puli t ff 
k=1 


S(x)<0 


K 
Yo Pea —hea) 
k=l 


f(x) dx 


It can be shown that minimum R is achieved at 


K 
S(x)= Spi (lea — ker) fa) 


k=l 


This is the equation for the optimal divisional surface that determines the neural 
network optimal model. 


Neural network of type 8. The optimization of the neural network with a pattern class 
continuum and two solutions by the minimum risk function criterion requires the 
introduction of matrix (row-vector) L = [1,(€), L(€)] of the loss function correspond- 
ing to the first and second solutions. The conditional risk function is the risk function 
that makes a decision about the neural network input patterns belonging to the assem- 
blage with distribution f’(x/e). Typical loss functions 1,(€) and I,(€) are shown in 
Fig. 6.11. The expression for the conditional risk function is 


a = 
re)=h(e) ff flxledxth(e) ff f(xseddx 
S(x)<0 S(x)>0 


The equation S(x) = 0 is the equation for the divisional surface in the multidimen- 
sional feature space. 


Fig. 6.11. f a Af b 
Loss functions for the case Po 
with a class continuum and 
two solutions: a two decisions; p; 
b class continuum 
E 
0 a 
I ! 
lo 
hy 
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The average risk function is obtained by averaging across all the € values 


R= J r(e)f-(e)de 


-frohof.] Lf meen p _f fxledx de 


S(x)<0 S(x)>0 


The final expression after transformation is 


00 NS 
R= [ Aleit See) “UG J Fateh de 
—oo S(x)<0 


Or in the other form, 


R= J L(e)f-(e)de+ TI fx fl L(€)—h(e)|fele)f'(x/e)de} d 


S(x)<0 —0o 


Consequently, the minimum is achieved at 


oO 
sw= f [h@-bWfeOf(xledde 
—oo 
Neural network of type 10. The neural network creates some divisional surface in the 
multidimensional feature space in the case of a pattern class continuum and K, solutions. 
The optimization by the minimum risk function criterion requires the introduction 
of matrix (row-vector) 


= h(€)>01k, (é) 


for the function of losses emerging due to consideration of patterns related objectively to the 
law f’(x/e) as belonging to the regions of the multidimensional feature space correspond- 
ing to the first, second, i: neural network solutions. The conditional risk function is 


r(é)= Siop f _f f'&ledx 


=] 
kp skp (x)>0 


The average risk function after averaging r(€) is 


R= J torpderte= fro, ere _f f'@ledxde 


p=l s*P (x)50 
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Consequently, the optimal neural network model is determined by the system of 
inequalities: 


lla= f fale 


lug (€)—hy Of" lede<0 kf =heoKp 


Neural network of type 11. This is the system of continuum pattern class recognition 
with a solution continuum. The minimum average risk function optimization criterion 
requires not the matrix 


(= Iheomalky (e) 


as in the previous case, but the function of losses [(y,¢) emerging due to consideration of 
the neural network input patterns related to the assemblage with distribution f’(x/e). The 
conditional risk function is 

N 


—— 
r(e)= fy.) [ [Gy f'xle)dxdy 
Y 


xX 


The average risk function after averaging r(€) is 


N 


R= f[ rofelede= f fle fino. [C.x)xf'«/edxdyde 
xX 


—oo —oo Y 
N 


—“~ ioe) 
=f J... 60. f flO, of (lode dx dy 
yo" xX i 


Let us designate 


3(X, y) = J flely,e)f (xlede 


—oo 
Then the expression for the average risk function is 


N 


—, 
c) J. J Soreneste yhtxdy 


Taking into account aforementioned G(x,y) properties, one obtains 


N 
SS 


R= {_ [ gsb.PC)|dx 


x 
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where y = P(x) is the optimal neural network model. The optimization solution gives 
the following condition: 


Og3(x, y) 
Oy 


=0 


Y=P(x) 
and for particular form of g;(x,y), 


f felef'cete) reer 


—oco 


6) 
By é) 
y y=P(x) 


This is the most general expression for the neural network optimal model that takes 
into account each of the aforementioned cases. 


6.3 
Optimal Neural Network Model for Multidimensional Signals e€(n) and y(n) 


The expression for the conditional risk function has the following form: 


N N 


re)=f{ [une f [Gl.xf'(xleddxdy 
Y x 


Then the average risk function is 

N* N* N N* 
= SSeS — 7 
R=[ freoflede=f ff feo» [. [ f@lne)f'&/eddeldxdy 
E Y Xx 


E 


and after the introduction of additional designations, 


N N 
r= fff foomernessy 


As it is mentioned above, E is the neural network supervisor instruction space; and 
N’is dimensionality of E and neural network output signal. If N’= 1, then the neural 
network has Ke solutions, and function G(x, y) has the form 


K. 
1, if xeES P(x)>0 
Gane je 
0, if x¢S P(x)>0 


and for the solution continuum, 
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1, if y=P(x) 
0, if yx#P(x) 


6(ay)=| 


The transformation described by the system in the case of multidimensional E and Y 
can be written in the form 


y(n) P(x) 


y(n) =P[x(n)] or a 
Vy" (n) Px) 


If N’= const. and the neural network has a discrete number of the output signal 
(K, solutions), then function G(x, y) has the form 


Kp».ok * 
Np 


[ip : ae) 


and for the solution continuum, respectively 


1, if xeés >0 
G[ahipr ky E aan 
P 


0, if x¢S (x)>0 


. fl, if y=P[x(n)] 
(x9 ry) = 0, if y+P[x(n)] 


Consequently, the expression for the average risk function is 
N N N* 
fae FF 
R=[ f gP).x]dex=f fff fel@uP@,e] f'/e)dedx 
x x E 


The neural network optimal model is determined by the expression 


N* 
SS 


ff felofere) 


E 


de=0 
y=P(x) 


0 
—I(y, 
By (€) 


where derivative 


jy) 


is the function of two variables y and €. 
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Let each of the N’ output channels have K, amplitude gradations. Then the expres- 
sion for the conditional risk function has the form 


K K 
r(ky,..-s ky) = > sha x, Why p>.--»ky*pskys-- sky) 
kp=1 kn*p=l 
N 
“ Yah f'(xlée=ky,...,kyx)dx 
gfM1P>--kN*p) 


(x)>0 
and the average risk function is 


K K 
Ray an > r(ky,..-sky*) fe(kqs- ky) 


kj=1  ky*=1 


K K K K 
= ‘> wee YY ys > Ikj p>.» kyxpskys---sky) 


Kp 1 kn*p 1 kj=1) ky*=1 


N 


XfolKjs--ssknys) oi fl (xlé=hy..-skyx)ax 
g(M1P>--kN*p) 


(x)>0 


After introducing additional designations 


N 


K K 
R=>*... Jf slip okep dex 


kp =] ky*p=l (MIP ; oky*p) 


(x)>0 


where 


K K 

8(kyps.--kyep»X) = ye ss Why ps.--skyepskys-- skye folky.» ky) 
k=1 ky*=1 
xf'(xlée= ky)... sky) 


the result of the average risk function optimization in this case has the form 


gftpr NP) 9) = g(t... skfrepsX)— g(Kyps---skyypsX) > 0 


Ip: 
Kt... kien) =(0...0),...5(K,...K 
(kip N*p)=( )o-+ 9 ) 


N* N* 


ie, KN” combinations exist. 
It is possible to consider the case K = 2 as the most simple one for the implementation. 
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6.4 
A Priori Information about the Input Signal in the Self-Learning Mode 


The self-learning mode differs from the learning one because of the absence of infor- 
mation in the form of supervisor instructions about patterns belonging to a particular 
class. Consequently, this information must be represented in the neural network a priori. 
The following limitation for class determination in the self-learning mode is reason- 
able: only one mode of the input signal x(1) distribution density must correspond to 
each pattern class. 

The a priori information about the input signal significantly influences the self- 
learning problem solving methods. This information can be divided into three main 
parts. 


1. A priori information about the number of classes, i.e., about the number of input 
signal distribution density modes. The neural network input signal distribution can 
be represented according to the a priori information in the form 


K 
F(®)=D> Def) (6.19) 


k=1 


where x(7) is the input signal; f(x) is the input signal distribution density; f,(x) is 
the patterns distribution density in the k-th class; p, is the probability of the k-th 
class pattern emergence; and K is the number of classes. 
2. A priori information about the form of patterns distribution density in each class. 
3. A priori information about probabilities p,. 


A priori information about the number of classes K (about the number of input signal 
distribution density modes) can be of three types ordered by the decrease of the a 
priori information: the number of classes (modes) is known exactly; the number of 
classes (modes) does not exceed some given K,,,,,; and the number of classes (modes) 
is not known. 

The self-learning solution algorithm for the given number of classes must be devel- 
oped in the first case. The self-learning algorithm optimal for the maximum number 
of classes and optimal for the smaller number of classes must be developed in the 
second case. The self-learning algorithm allows one to develop only a qualitative solu- 
tion for the gradually increased number K,,,, in the third case. The termination crite- 
rion must be introduced in the latter case. The absence of the self-learning quality 
improvement at the K,,,,, increase or the excessive algorithm complexity can be such 
a termination criterion. 


A priori information about the form of pattern distribution density in each class can be 
of three types ordered by the decrease of the a priori information: the distribution 
form is known exactly; the distribution form is not known but some distribution ap- 
proximation can be accepted; and the distribution form is not known. 

The neural network optimal model implementation methods are dependent upon 
the a priori information quantity. 


114 


Chapter 6 - Design of Neural Network Optimal Models 


A priori information about probability of the k-th class pattern emergence. The a priori 
information for the representation (6.19) can be the following: coefficients p, are the 
same for all classes; coefficients are different for all classes but unknown. 

The first case does not impose any limitations upon the self-learning problem solv- 
ing methods. The second case results in the self-learning process complexity due to the 
necessity to determine coefficients p, in addition to determining distribution param- 
eters for each subclass in the adjustment procedure. 


6.5 
About Neural Network Primary Optimization Criteria in the Self- 
Learning Mode 


The primary optimization criterion also represents some additional information em- 
bedded in the neural network a priori. This criterion determines the quality of the 
recognition system that must be achieved in the self-learning mode. 

It is possible to use the primary optimization criterion in all the aforementioned 
cases (the known pattern distribution, pattern distribution approximation, the unknown 
pattern distribution). Hence, the divisional surface is calculated according to the fol- 
lowing expression: 


2 
OF (x) _ 0 under constraint cae Ad 


>0 
Ox dx? 


The solution of this equation corresponds to the threshold h, (Fig. 6.12). 
The following criterion can be used when pattern distribution across classes can be 
determined or approximated: 


Pif\ (x) = Pofa(x) (6.20) 


The solution of this equation corresponds to the threshold h, (Fig. 6.12). 

The use of the primary optimization criterion (6.20) in the self-learning mode cor- 
responds to the idea of self-learning human in the case of two features and two classes 
(Fig. 6.13). The goal is to put a divisional surface through the places with the presence 
of minimum patterns. 


Fig. 6.12. A f(x) 
The introduction of primary 
optimization criteria for self- 
learning neural networks 


yx 


hy hy 
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Fig. 6.13. Ax, 
Illustration of criterion (6.20) 
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Fig. 6.14. 

Comparison between primary 
optimization criteria in the 
self-learning mode 
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It can be shown that the optimal solutions according to the aforementioned pri- 
mary optimization criteria in the self-learning mode are different. The particular case 
shown in Fig. 6.14 illustrates additional features of these criteria. Three possibilities 
can be mentioned: 
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1. Classes are easily selected, i.e., the crosscut is small. In this case, 0, and o, are much 
smaller than the distance between centers of classes. The optimal thresholds h, and h, 
correspond to the first and second primary optimization criteria and are close (Fig. 6.14a); 

2. The crosscut of classes is so large that the first criterion for the threshold h, is in- 
valid. One of the parameters is larger than half of the distance between the centers 
of the classes, and the second one is comparable with it (Fig. 6.14b, 0, = 1 > 0.5); 

3. The crosscut of classes is large, and thresholds h, and h, greatly differ. Parameters 
o, and o, are of the same order with half of the distance between the centers of the 
classes (Fig. 6.14c). 


The obtained results have the following explanation: in the cases (1) and (3) the 
input signal distribution f(x) is two-modal, and in case (2) it is one-modal with un- 
clear qualitative sense for division of one hump of curve into two classes. 

This is the reason for introducing limitations related to the modal characteristics of 
distribution density into the definition of classes in the problem of pattern recogni- 
tion. The representation of f(x) as a multi-modal function allows one to use a special 
average risk function as the primary optimization criterion in the self-learning mode. 


6.6 
Optimal Neural Network Models in the Self-Learning Mode 
and Arbitrary Teacher Qualification 


Let us suppose that patterns are grouped around some unknown centers of by, classes. 
If the function of the distance between patterns and the k-th class is 


2 
P [big =z —bip | 


then the average risk function of x belonging to the region of the k,-th solution can be 
represented in the form 


lig = f |x—b,, | f(x)dx 


st) exys0 
where ||*|| is the norm of a vector. The average risk function is 
K 
P 2 
R= J |x—by, | f(x)dx 
kp=l 


st) eys0 


The region of the k,-th solution (k= 1,...,K) with minimum R is determined in 
this case by the following system of inequalities: 


k 2 2 
s| Pe =fx—by -|x-b,,| 50, kyeky=b.oK 


i (6.21) 
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The equation for coordinates of class centers with a minimum R is 


fo xif@)dx; 


(kp) 
P 
2 : TSN Xk hale k, (6.22) 


P f f@dx 


slke ligsi 


The systems (6.21), (6.22) determine the optimal neural network model in the self- 
learning mode. The loss function 


2 
P [x,b,, }=[x—bip | 


is a rather rough approximation of distribution inside the class. A more precise ap- 
proximation can be achieved by complication of the optimal model at the expense of 
the loss function complication in the following way: 


2 
x—b 
kp 


sey} 
P Or 
P 


or by using a more complex function p(x, b;,) = I|x-b,|/?. 

Similar to the case of the learning mode at the transfer to the solution continuum, 
the loss function p [x, b(y)] is introduced in the case of the self-learning mode. Here 
b(y) is either the final or intermediate result of the neural network synthesis. The 
average risk function has the following form in the case of a discrete set of solutions: 


K. 
R= 3° J o[xbi, }@(xk )feoax 
kp=l x 


where 


1, if ges tlso 
G(xkp )= (k ) 
0, if xZS P (x)>0 


In the case of a solution continuum one obtains 


R= ff lx v(n]G(x y)f(~dxdy 
YX 


where 


1, if yeP(x)>0 
0, if y¢P(x)>0 


6(a9)=| 
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or in another form, 


R= { p[x,b[P(w)||f(x)dx 
x 


The expression for the optimal neural network model is obtained by derivation of R 
by y = P(x) similar to the case of the learning mode: 


f 00)5- plas) 


a) ; 
=0, f(x)—p[x,b(y)|=0 , i=1....N 
_ Ob; 
y=P(x) 1 


with additional Silvester conditions for the matrix of mixed derivatives. 

Let us consider the neural network optimal model with K, = K solutions under the 
arbitrary teacher qualification b. 

One must define the loss function I(x, bi» b, Lok) in such a way that in the learning 
mode, when b = 1 then! =1, ,,in the self-learning mode, when b = 0 then! =p (x,b;,), 
and when b = -1 then the primary optimization functional with the extreme inverse 
respective of the learning mode. Such a loss function can be written in the following 
form: 


I(x, by, »bsligk) =P + (1-8?) p(x. by, 


The expression for the average risk function is 


reef | 


kp=1_ (kp) 


K 
> Prt lkgk 


k=1 


+(1-b)|x—byp | jas 


(x)>0 


The optimal region of the k,-th solution can be represented in the following form: 


k, 
P _ = 
ol | x)= 84 0) 84,(%)>0 ; ky #kp =... Kp 


The expression for optimal values b,_; is similar to (6.22). 

It was assumed above that the teacher qualification in the neural network optimal 
model development is exactly known. The approximate (not exact) knowledge of the 
teacher qualification is usually observed, for example, in the field of medical diagnos- 
tic solution. In the case of K = Ky classes and neural network solutions, one obtains 


R->° [ | 


K 
> |bePilkpk +(1—b2 P(X dj, f're= 0| dx 
kp=1_ (kp) 


k=1 
(x)>0 


The optimal neural network model determined by the system of K inequalities can 
be written in this case in the form 


Literature 


119 


K 
z Do Pete ® agg! 
yb Prlkkyy — Pid) + (1—BE)| oC bj, )— PC By } E=1_—___> 0 (6.23) 
— Pri! 
k'=1 


where k= LyicegKk. 

It is supposed in this case that the subjective teacher qualification does not depend 
upon the class number. 

Consequently, it is seen from (6.23) that when b.=1 and [a,,] = A, (A, is a unit 
matrix), then one deals with the learning mode. In the case b,= 0 and arbitrary values 
of ay, one deals with the self-learning mode. In the general case b.= b, the system is 
adjustable. If the teacher qualification is zero then the system is not adjustable in the 
learning mode. 

All of this indicates a significance of the a priori information required for the neural 
network optimal model design. Sometimes, it is not necessary to have such informa- 
tion. The amount of the a priori information about the form of f’(x/e) determines 
methods of the neural network optimal model’s implementation represented in 
Table 6.1. 
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Chapter 7 


Analysis of the Open-Loop Neural Networks 


7.1 
Distribution Laws of Analogous and Discrete Neural Network Errors 


The initial data for the analysis of the open-loop neural networks are the given dis- 
tribution density of the input signal and the structure of the open-loop neural net- 
work. The following open-loop neural network structures are usually considered: 
neurons with two, K,, and a continuum of solutions, nonlinear and multilayer neural 
networks. 

The goal of the open-loop neural network analysis is the investigation of ex- 
pressions for distributions and moments of distributions of intermediate and output 
neural network signals. This chapter mainly concerns the analysis of distributions 
and moments of distributions for the neural network errors. The functionals of the 
secondary optimization are selected on the basis of the open-loop neural network 
analysis. 

The functional of the secondary optimization is considered as a functional expressed 
through the distribution parameters of the current neural network signals and errors 
that are directly minimized by the multilayer neural network under the closed-cycle 
adjustment. The main problem is the creation of the secondary optimization func- 
tional corresponding to the given primary optimization criterion. The coincidence of 
the neural network parameters providing minimums of primary and secondary 
functionals is considered as a desired correspondence. 


7.1.1 
Neuron with Two Solutions 


The transformation performed by the neuron with two solutions can be represented in 
the following form: 


N 
y(n)= sign) > a;x;(n) = sign g(n) (7.1) 
i=0 


The expressions for analogous and discrete neuron errors have the form 


Xg(n)= e(n)—g(n) 5 xXg(n)=e(n)— y(n) (7.2) 
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The input signal distribution function with K = 2 is (Chap. 5) 


TAC + 7 Arfils) » if e=1 
f(%e)= i 
Fi ar , if e=-1 
Here 
Ay =[2-+(c —e1) by (4 +¢2)] P1 
Ay =[2 +(e, —¢y) +b (4 +¢2)| Po 
By =|2+(c2-) +b (cy +¢2)| Py 
By =|2+ (2-1) by (qq +62) | Po 


The analogous error distribution of the considered neural network is 


N-1l 


4 
fid=gaf .. f 


N 
1—ay -x G; 
a 1 
Aft X]o-+y»XN_1> a pe 
N : 


N-1 
1—a) —x a; 
Ap f| X19--+ Xp —* — > x; | 
an i-1 4N (7.3) 
N-1 
—l—a)—-x a; 
+B, fi] Xp-- XN Q "a x; 
an i-1 an 
N-1 
—1l+a)—x a; 
+By fo (x15--5XN_p Qa So x; Lx yyy yee Ay 
an i-1 4N 
and the discrete error distribution is 
1 F 
120 @,)+B,(1-®)| , if xy=-2 
1 : 
f(x,e)= gl Ail— 21)+ Bi, + Aa(1— Pp) + Bra] if ag=0 (7.4) 
1 ; 
ga + 42%] » if xg=2 


Here 
N-1 
[oe) 
A 4n-1 = 
b= 0,|...., =f ff fGen da.sdey 5 k=12 
a an =e N 


me 3 
Dy 1 


4N j=; 4N 
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Expressions for the r-th order distribution moments in the case of analogous 
and discrete errors of the considered neural network can be represented in the 
form 


1 r ; N 
Bra I DT di Big 


m=0 i]. .i=l 
rm r—m () rm rn (2) 
Viprolin Viprolin 
N 
(k) 00 
Vitecaton =f coe iy Bigg felD4x 
—oo 
ae (Ae + Ay®y) +(—1)' (By + By — B®, + B,®,)| (7.6) 


In the particular case of c,=c,=1 and b,- b, =}, the distribution of the analo- 
gous error of the considered neural network has the form (7.3) with the substi- 
tution 

A,= 2(1 = b)p 3 A,= 2(1 oh b)p, 

B, =2(1 + b)p,; B,=2(1 - b)p, 


The discrete error distribution has the form 


sea + b)(1-®,)+ p(I—b)(1-®,)] , if xg=-2 


1 
f(x,e= sl + b( py — pi) +2p\bD,—2p,b®,| , if xg=0 


1 . 
yt PA +04 bp] if x,=2 


The expression for the neural network discrete error moments of distribution has 
in the given case the following form: 


Og =2{p, [1+(1-2@, )] + py [1—b(1-2@,)]} 


and separately for pattern assemblages of the first and second classes: 


Og =2[p\(1+b)(1—®) + pa(1—b)(1- | 


Org = 2|p(1—b), + po(1+-b)®] 
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7.1.2 
Neuron with a Solution Continuum 


The transformation performed by the neuron with a solution continuum in the learn- 
ing mode can be represented in the following form: 


y(n)= Fe x;|=F[g(n)| 


In the case of a neural network input class continuum, 


f(x,ée)= f'(x,....xn /€) fF, (E) 


The joint distribution for the signal €(n) and analogous output signal g() has the form 


Yat 


E| fe(e)dxy_...dxy 


fee(8€)=— f Perfo tye 


The neural network analogous error distribution is 
N=1 


N-1 
1 oe —XgtéE+dy Wa a 
falta)=— f _ ff Hy s.--9Xy_ys— OH : 
An _o an i-1 4N 


é fe(e)dxy_y.. dx, dé 


Consequently, the expression for the r-th order moment of the analogous error is 
N 


eof AS 


The neural network discrete error distribution is 


r 


—\"ajy;+et+ay| x f'(y/e)f-(e)dy de (7.7) 


i=l 


N 
1 a F \(e-x )+ 4 Nel Gy 
x,)=— #) 9e4 geny XNT_A9 g ix. /e 
Fig (%g) re ee Deo XN-1 = a pe A 
(7.7a) 
dF '(e—x,) 
x fe(é)|—-——- 8" ldxy_1...dx, de 
d(e—x,) 
Consequently, 
N-+1 
AF r 
oo N 
Ory=f. [|-F|-Slaivimag|+e| fi(yle)feleddy de (7.8) 
—00 i=l 
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In the ane: case of two pattern recognition learning, one obtains 


fag %g)=— - Pi via 7 


~\(-1-xg) 
d(—1—x,) 


=1 = 
Fe) ay “| 


cif a a 
7 a (7.8) 
N-1 
1 
Po dF (l—x, 
xdx dx + 
N-1-+-O%] ie is) dix) 
(l-x,)+a, NO gq 
xfo XY o-++»X py Z ae dxy_} dx; 
an i-) 4N 
fore) N ij 
Og=pf ... f ae: «| fily)dy 
—oo i=1 
(7.9) 
N r 
+pof.. [|-F)-Soaivxi-ao|+1] Ardy 
oo i=1 
and separately for pattern assemblages of the first and second classes 
N 
oo N if 
as ie J 4; | 1) fily)dy 
(7.10) 
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faly)dy 
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N 
Say Ag | 
i=l 


N 

—S 
Co 

=| ee I 
—0oo 


The latter expressions allow one to obtain the expressions for some particular cases: 


1, if g>Aa 
F(g)=signg ; F(g)=4j0 , if —Aa<g<Aa 
—-1, if g<-—Aa 
1, if g>Aa 
F(g)=|% , if —Aa<g<Aa 
Aa 
1, if g<-—Aa 
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FAc3 
Analysis of a Neuron with K, Solutions 


The joint probability distribution of the input signal for the recognition system of 
K pattern classes in the learning mode has the form 


Pref (x) ) if E&=K 
In this case, 
yn)=k, , if Ik kp <g(n)< Dk skp +1 (akc = 00,49) = —oo) 
For the assemblage of K-class patterns, 


N-1 
are 


k+a)—Xq 
fra(%q)= =e 2 Piso oXnN_4s - vat 


The analogous error distribution will have the deieits form: 


~ | p.- AXy 


N-l 
=. . 


K K 
fal%a)= >> Pifal%a) oe J 
k=1 k=1 


N-1 


k+ay—x a 
tk finn 7 > x;— 
an i-) 4N 


dxy_1.-.dx, 


The distribution of discrete output signal of the considered neuron type for the k-th 
class can be obtained in the following form: 


Ok. Jk. 1+ 4 a ay _ 
f(y) = ®, | pts. g I 
an an an 
(7.10a) 
Ak —1,k +o a a 
@,|—? —? ts | if Y=Sk 
an an an 


Similar to the previous case, here 


a a 
© | kp—bkp 7% a, ay-1 
i oe 


greeny 


an an an 


N-1 YE 
an i=] %N 


; { f Fe Miveinky Oys.deiy 


1 - Distribution Laws of Analogous and Discrete Neural Network Errors 


127 


Consequently, the discrete error distribution is 


: Ak_k" kk" 41 +40. ay ay_4 
> aisle. 
an an an 


K 
feg %g)= DP D 
k=1 


Ay pu _ _ye ta a a ‘ 
o,]% 1,k—k 0,4 _ an if ae 


an an an 


The expressions for the r-th order moments of distributions for analogous and dis- 
crete errors can be represented in the form 


N 


K is o-<) 
Ora => Pe YL CMa tom" ff 
k=1 m=0 —oo 


r—m 


N 
Soaiyi| — fx%)dx 
=| 


ap ppp +4 a a 
k! k—k" »,k—-k' +1 : lL N-1 
= yn | : 


eg k N an an 


Ay kA kk" +o a an_} 
D, 3 ee 
an an an 


The expression for the r-th moment of distribution for analogous error has the 
form 


a. 
1 


k N- 
=e Dowh Xy>-+9XN > ca sg) re 
aie at 


dx, dxy_...dx, 


After the change of variables x, =y,;... Xy_)= Yn_p 


one obtains 


N 
K 00 N if 
Cra= SP fl... [ [vai tO +40)) fv Iwdyn--dyi 
k=1 —00 i=1 
N-1 
(— r—m 


K co rom N 
a= Pf... [Cat] ays] fedy 
k=1 —coo =6m=0 i=l 
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7.1.4 
Analysis of a Pattern Recognition System with a Nonlinear Divisional Surface 


It was shown in [7-4] that a pattern recognition system with a nonlinear divisional 
surface can be represented by the equivalent system consisting of an inertialess layer 
of nonlinear transformations and a neuron. It can be shown that if the nonlinear trans- 
formation layer forms vector components (X), ...5Xyp {Zj,,i,} «++» {2j,,...,;,5) from vector 


components (x),...,Xy), Where i,,...,i,=1,....N, and j,i, =%i,,..,i2 then the distri- 


bution density of this layer output signal can be represented by the distribution den- 
sity f(x) in the following way: 
f(x) , for alli,k,(k=1,...,r), 
at which 2), ij, = Xjqo-+9Xiz 


par eens eho) = 


0, for all i,k, at which 


Zi. sik A Kip ae a Kip S(K=]j caus?) 


The expression for the second moment of distribution for the discrete error of the 
considered nonlinear system has the form 


O_.= 4|D5p, + Pi —@ip,| 


where 
N* 
—— an 
; ‘eh 
Sk re J fil)ae! and S'(x')=—ay +S ajxi 
S'(x")<0 <4 


It must be taken into account that the expression 


N 
S(x)=—ay +> \a;x; =0 
i=l 


determines a linear divisional surface in the initial feature space. Let us determine the 


change of divisional surface in the initial feature space under the increase of the order r 
of nonlinear transformation. In the case of the second order transformation, 


N* 

ioe) fix) if Zinin = Xirin j ; 

0 if dx; + dX)y* 
1 Zui z Xinin 


oO 


N 

=f ie J Ae il 8Zii —Xinin )dx}...dxyx 
S'(x!)<0 i=] 
=] 
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Co N si 
/ 
7 = =I q=1 
—agt+ >> ajxjt+ os Gin in Ziyi <0 ig=1 12=1 
i=l i,=l 
1 
i7=1 


[o.@) 
= oe filx)dx 
N N 
—agt 2 ajxi+ D2 Gizin Xi Xi <O 
i=l i=l 


iy7=1 


Consequently, if r= 2, then the equivalent divisional surface in the initial feature 
space will be the surface of the second order with coefficients determined in a unique 
manner by the coefficients of the output neuron with the input layer of the nonlinear 
transformations. In the r-th order transformation, 


aC 
Ong =2) py + td. [Po fa(x) — prfi(x)]|dx 
r N 
49+ Fa. i Xi... ip, <0 
k=1 i,...,ifK=l 


This proves the equivalence (by the average risk function criterion) of representa- 
tion of the pattern recognition system with nonlinear divisional surface in the form of 
nonlinear transformations units and a neuron. 


72 
Selection of the Secondary Optimization Functional 


Let us consider the secondary optimization functional related to the moments of dis- 
tribution of analogous and discrete errors for the neural network with two solutions. 
The general requirements for the neural network secondary optimization were men- 
tioned in the introduction. The functional parameters that are required for the itera- 
tion search procedure must be sufficiently and simply measured and evaluated. This 
functional must have a simple form relative to the adjustable neural network coeffi- 
cients. It must be minimal at the same adjustable neural network parameter values 
that provide an extremum of some primary optimization functional. 

The analysis of expressions (7.5) and (7.6) for the moments of distribution ac- 
tivation function analogous and discrete error results in the following conclusion 
[7-2, 7-3]: 
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. The odd-numbered moments of neural network analogous and discrete error dis- 


tribution in the learning mode cannot be used as a secondary optimization func- 
tional. However, their absolute values can be used as such functionals. 


. The even-numbered moments of the aforementioned distributions can be used as 


the secondary optimization functional. In the case of discrete error, only consider- 
ation of the first and second moments order must be performed because the mo- 
ments of higher order are proportional to @,,. 


. The main goal of this chapter is the analysis of the primary optimization criterion, the 


a priori information about the input signal characteristics and the loss matrix corre- 
sponding to the minimization of the selected secondary optimization functional. 


. The analysis of the expressions for |o,,| and o,, in the case of a neuron shows that 


the minimization of these secondary optimization functionals is equivalent to the 
minimization of the average risk function under consideration of only the first order 
moments of distribution for pattern assemblages of different classes. It is consid- 
ered that the emergence of a priori probability activation function patterns is the 
same for all classes, and the following restrictions upon the loss matrix coefficients 
takes place: (1,,- 1,,) = (l, - ). 


. The analysis of the expression for the absolute value of the first moment of the discrete 


neural network error distribution 
ag | =2|pyP) — pi + pD| 
shows that the |o,,| minimization results in satisfying the optimization criterion for 


the average risk function under the condition that average risk function components 
are equal for both classes and the following restriction upon coefficients of matrix L: 


by-by=hi-lby 


. The analysis of the expression for the second moment of the discrete neural net- 


work error distribution @,,= 4|P,®, + P; - P;®,| shows that the a, minimization 
results in satisfying the optimization criterion for the average risk function under 
the previous conditions. 


. Additional limitations related to the finite number of considered moments for |©,,| 


and @,, and p,r, = p2r, for |@,,| make these functionals a single-extremum under 
the limited structure of the open-loop system and multi-modal input signal distri- 
bution. Functional o,, can be a multi-extremum in the general case local minimum 
for the average risk function and I,,- 1,,=1,,- |). 


. In the case of an arbitrary open-loop neural network structure (arbitrary divisional 


surface) according to p. 7.1 for b, = b,=1 and c,=c,=1, one obtains 
Ong =4[p,P, + pi — PD] 


where 


N 
SS 


=H, |S(x)]= ff filx)dx, k=1,2 


S(x)<0 
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Here functional @,, is proportional to the average risk function under the arbi- 
trary neural network structure (two pattern classes, two solutions) and aforemen- 
tioned limitations upon matrix L. 

9. The consideration of the aforementioned functionals of secondary optimization is 
interesting in spite of the limitations because it results in the sufficiently simple 
adjustable system implementation with the closed-cycle adjustment, and it can be 
useful in the design of the neural network with flexible structure. 


7.3 
About Selection of the Secondary Optimization Functional 
in the “Adalin” System 


The basis of the closed-cycle adjustment methods represented in the works of Widrow 
[7-1] in the so-called “Adalin” systems is the minimization of the second moment of 
analogous error distribution. He used the following rule: 


It can be shown using some geometric arguments that the mean square of discrete error is a mono- 
tone function of the mean square of analogous error and their minimization is a minimization of 
the average risk function. 


This rule is not correct because the minimization of the average risk function for 
Gaussian distributions with different covariance matrices is performed with the help 
of the second order divisional surface. Let us consider one neuron system. Then the 
coincidence of the optimal solutions by the criteria of o,, and o,, minimization takes 
place only in the case of one and the same covariance matrices corresponding to the 
first and second pattern classes [7-2]. 

Let us analyze the extremum properties of the second order moments for analogous 
and discrete errors of a one-dimensional neuron in order to estimate the difference 
between the optimal solutions by the criteria of @, and @,, minimization. 

To do that, one must (a) calculate the minimizing coefficients a) and a, for a,,3 
(b) calculate the minimizing coefficients ay and a’ for :,,; (c) calculate the difference 
AQ, = Dy ql Mp, a) = Oe Ay> ay). 

Figure 7.1 shows the dependence AR = Aq,,(i1 ,) for some particular case from [7-2] 
for two normal distributions with fixed mathematical expectations, one of which has 
a changing variance 1, for the distribution of one of classes. The limitation of a, 
minimum criterion is well illustrated by the example of multi-modal distributions 


Fig. 7.1. AR 
Comparison of the a@,, and 


Org minimum criteria O02 passes snes Fer i 
1 
1 
1 
i 
1 


Oi eee 
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Fig. 7.2. 

Comparison of the a, and 
>, minimum criteria for 
multi-modal distributions 


a an a O12 


(Fig. 7.2). Here the neuron thresholds a) and a; optimized by the o,, and @,, minimum 
criteria are shown. The crosshatched area is the difference AR between two criteria. 


74 
Development of the Secondary Optimization Functionals 
Corresponding to the Given Primary Optimization Criterion 


The secondary optimization functional development is performed below for the case 
of the open-loop neural networks with the arbitrary structure with K, = K=2,i.e., with 
the divisional surface of the arbitrary form. 


7.4.1 
The Average Risk Function Minimum Criterion 


The main problem consists in the selection of the neural network discrete error trans- 
formation x)() = €(n) - x,(n) to obtain the discrete error x(n) with the second mo- 
ment of distribution equal to the average risk function. Let us multiply x9(1) by A if 
€(n) = -1, by B if e(n) = 1, and then add C to the result. Let us determine parameters 
(A, B, C) in a way that the second moment of distribution fu @) is equal to R: 


fy (x, )= PP, +(1—- ®) py > if xg=C (7.11) 
PoP, , if x,=2B+C 


yg = py(2A—C)? + pC? + p,® (44C—4A”) + py, (4BC—4B") (7.12) 
R= pyr + Palor + Pi (hi —h2)® + P2P2 (la —l) (7.12a) 
Consequently, 


C= q)Palon +2 pth — Pub 
ro \ Po (hi theo) +h (Pi — Po) + \ Palen +2piho— Pihi 
2 


pa wa Pros + 2Ptho — Pia —VPaloo + 2Prh2 — Pir 
2 


(7.13) 
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To obtain parameters A, B, C providing the coincidence of o,, and R within constant 
direct-current component (p,/,,+ pol,,), one must put 


1 1 
Cc=0 ; a= Nal 5 B= vii bn (7.14) 


It is possible to use the following transformation (Z) for the discrete error: 


Z2 » if Xg=—2,€=—1 
j Z11 > if X_g=0,€=—1 
X_(n)= : 
222 > if Xg =0,€= 
271 > if Xg=2,€=1 


In this case, 


Pi(l-®) , if x,=Zp. 
F PD > if Ky = Zy1 
fut (x )=] ie (7.14a) 
Px(I-®) » if x,=Zy 


PoP, » if xg=Zy 


and the conditions for a, and R coincidence are 


Zigk = kpk (7.15) 


742 
Minimum criterion for R under the condition p,I, = p5r, 


The minimization for R under the condition p,/, = p,r,, ie., under the condition 


Pr ® + Pro (1—®) — paloy®y — palo (1-—®)) =0 (7.16) 


is an equivalent of the Lagrangian functional minimization: 


R =[pih 1D + Pi (1-®)|(1+A)+| polar ®y + Pal2 (1-®,)|—A) (7.17) 


The conditions for parameters (A, B, C) providing a, and R coincidence are 


1 
A= 5NA +A)| poli +(P1 — P2)ha|+ Palo A) 


1 
+5 VPalza(l A) + 2pihy (+ A)— phi A+ A) (7.18) 


1 
B= 5vul —A)(lo1 — Pylon) +— A)(2pth2 — Prt) 


V Palag(1—A) + 2pyh2+ A)— pi + A) 


— 
2 
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The left side of (7.16) represents gradient of R" by A. This value can be estimated as 
the first moment of the transformed (A,, B,, C,) discrete error. The transformation 
parameters A,, B,, and C, are obtained in the following way. It follows from (7.11) 


Of g = Py (Cy —2Ay) + Cy po + py P,2Ay + pyP, 2B, (7.19) 


The comparison between (7.19) and (7.16) gives 


L,-l L,—-l 
arora ; a Rs C= pil — Palo (7.20) 


The use of the previously represented Z-transformation and comparison of (7.14a) 
and (7.17) gives the following condition for their coincidence: 


Zy2=Vh2Q+A) 3 2x2 =Vl2Q-A) 3 Zy=Vhi+4) 3 Zy,3=ybi0+4) (7.21) 


The transformation for the discrete error with the first moment of distribution 
determined by the left side of (7.16) has the following parameters: 


Zia=ha 3 Zy2=—-ly 3 Zy,=—-bis Zy=hy (7.22) 


74.3 
The Minimum Criterion for R under the Condition p,r, = a@= Const. 


The minimum criterion for R under the condition p,r, = = const., i.e., under the con- 
dition 


Pir, + Prh2 (1-H, )-—@=0 (7.23) 


is equivalent to Lagrangian functional minimization: 


R” = pyyo(1+A)+ polon + pi (hy —ho)+A)®, + py (hy —lyn) ®, law (7.24) 


Expressions (7.24) and (7.12a) give the following conditions for parameters A, B, 
and C providing @,, and R coincidence 


C= Poly + 2piho (1+ A)— py + A)— aa 


1 
A= 0+ A)[pohi +(Pi—P2)h2|+ Pal — aA 


1 
+5 Palen +2pih 0+) py +A4)— aa (7.25) 


1 
B=— lo — Pin +1+A)(2pth2 — pi)— aA 


1 
5 Pale + 2p ho +A)—pihA+A)— aad 
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The transformation parameters A,, B,, and C, for the discrete error providing coin- 
cidence of moment (7.12) and the left side of (7.23) are 


L,-l 
a 5 B=0; C=ply—a@ U26) 


The use of the previously represented Z-transformation and comparison of (7.14a) 
and (7.24) gives the following condition for their coincidence: 


“A “A 
Zy=Vbi s Zn =Vha 5 Zip = fiat as 21 = fud+ AF (7.27) 


The determination of R* gradient by A is performed by discrete error x,(n) forma- 
tion with the distribution first moment (7.23) and the following parameters of Z-trans- 
formation: 


a a 
23, =Zy =0 5 Zhe Zi =hy -—— (7.28) 
1 1 


7.5 
Neural Network Continuum Models 


Let us consider a forming procedure for the secondary optimization functional in the 
case of a neural network continuum model. The minimum average risk function cri- 
terion will be used because the extension to other optimization criteria is not difficult. 

The problem of the optimization functional will be solved for the neural network 
with the arbitrary structure and will be illustrated on some concrete example as it was 
done above. 


7.5.1 
Neural Network with a Solution Continuum; Two Pattern Classes 


The discrete error transformation has the form 


Zy [x_(n)] » if en)=1 
x(n) = | 
Z, [x_()] » if en)=—-1 


Consequently, the distribution of the transformed error is 


AZ; '(x4) yp paltze C) 
dal | Pabaay Z, oo ra 


Zi (xy) 


fap ,) -_ Pifixg 


and the expression for the second moment of this distribution after transformation of 
variables under the condition of monotone form of Z, and Z, is 
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Og= f [Zag] Pifiey Og ety + f [Z2(x_)} Prfoxg (Xg)dxg 


The relationships y = P(x), Xg=E- P(x) are valid for the neural network with the 
arbitrary structure. Then 


Alay: Pitesti) , if e=-l 


7 Bite Pasceea) > © 21 


The discrete error distribution for the k-th class patterns has the form 


N-1 
aS 


- / 
fa @)= Le Jf falta Phys Ps x1s---0Na) 
—COo 


dP, (Xg,PXjy.-1Xy4) 
dx 


x 


dxy_1 dx, 


& 


The following expression for the second moment of discrete error distribution can 
be obtained after the corresponding transformations: 


N N 
—_———_— —e-_—_!: 


Oyg= | . [{Z-1-P@}’ pfi@odxt f . Jf {Z:[-1-Po)]}” pr fx)dx (7.29) 


It can be shown that in the particular case of the neuron with a continuum of so- 
lutions, one gets using (7.8a) 


_N y 2 
Ore =f ff ee Samal Pifi(x)dx 
—oo i=1 
N 
SS 


2 


“fe ic 


In the general case, 


N 
‘> ajX; — ag 
i=1 


N N 
—— 9 SN 


ag=f {4 xe} phe f J {2 x_(x))} prfalwex 


5 - Neural Network Continuum Models 


137 


The comparison of this expression with the expression for the average risk function 


N 
a, 


oo 
R= f - J PACOh [x4 = P()]+ Prof (x)a [x = P(x)|dx 
=o0 
gives the relationships for the discrete error required for a ,, and R equality: 


Zy(xg)=Jh(-1—xg) 5 Zo(xg)=,/h(1-xg) (7.30) 


7.5.2 
Neural Network with a Solution Continuum; Continuum of Pattern Classes 


In this case, 


fs (%5) = fag [2 


Consequently, under the condition of monotone form of the function Z(x,) 


= el) 


"fag (tg) bg (7.31) 


Here 
y=P(x) ; Xg =E€—P(x) ; xy = P'(xg5P,E,X3.--»XN_1) 


N 
SSS 


(oe) 
=f Rye ff fle sty P lag P6mp--0xn ale] 
—oo 


dP'(x 9, P,€,%15--->%y1) 


xfel€) oe odeae 
dx, 
Then one gets after the corresponding transformation of variables and using (7.31) 
N 
Ge lo. @) 
=f J... [{zle-Pool? ralofodedx (7.32) 
=o 98 


It can be shown that in the particular case of the neuron with a continuum of so- 
lutions, one gets using (7.7a) 


Sy 2 
oe=f f “ J le e—F | f(x\e)fele)dedx 


N 
dix; — a 
i=l 
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The comparison of this expression with the expression for the average risk function 


N 
SSS 


if { ie J fele)f(xle)I[x, = P(x), €]dxde 


provides the relation for the discrete error required for a,, and R equality: 


2(g)=lle—x_)e 


7.5.3 
Neural Network with K, Solutions; K Pattern Classes 


The Eq. (7.10a) gives the expression for the distribution of the transformed discrete error 
for the neuron with K, solutions in the case of the k-th pattern class at x, = (k - KARE 


.4AN-1 


“3 


an an an 


a a, a a, 
Kpkptt F40 ay ay-4 kp—bkp F490 ay 
F | ase > aes 


k 


Fst, (%g) = Px 
8 ay an ay 


Consequently, 


K K 2 
hg= >>) (k—kp Ago Pk 
k=lkp=1 


(a +a 
kp—bkp T "0 a 
> 


peeey 


an an an 


Ok, ,k, 1 +40 a a 
<a po“pt+ gat N-1 a 


an an 
In the case of the neural network with the arbitrary structure, 


K Kp 3 
he=)) >) (kp Ago Pk if f(x) dx 


k=1kp=l sigs. 


The comparison of this expression with the expression for the average risk function 


K Kp 
R= >» lok Pk J fy (x)dx 
k=lkp=1 SP) x50 


provides the relation for the discrete error required for o,, and R equality: 


1 
Ak ok = kok 
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7.5.4 
Neural Network with N* Output Channels; K, Gradations in Each Class 


The distribution function for the discrete error for the assemblage of patterns of 
(kj, ..+) Kyx)-class is 


N 
SS 


fk... sky) Figo» Xnag) = {oJ Fk... sky) dx 
s(Kip>-EN*D) 3.9 
at (Xi sei Xyxg) = (Ky e005 Kye) = (kyps +++ ky). 

Let us use the following transformation for the vector (x,,,...5 ky+) required for the 
expression of the transformed discrete error x,(n). Let us multiply vector x, by the scalar 
A(ky, «065 ky Kips a5 ky-p) and calculate the sum of squares of the resultant vector com- 
ponents. The result will be the transformed discrete error x,(7). For the pattern class 
assemblage, 


Ko Ko Ko Ko 
M[xig +---+2Neg)= oD ee 


kj=1 ky 1 kip 1 Kn*p 1 


(kys.-skyp)” + (ky — keys)? 


N 


XA? (kys.--akqyeskips---okarep) felkys---»kqye) asa fd. skye) EX 
ship» --kN*p) 


(x)>0 


The comparison of this expression with the expression for the average risk function 
provides the relation for the transformation A parameters in the following form: 


I[ky,..skyeskips---kysp] 


5 F (7.33) 
(ky kp) +--+ (kayss---skavep) 


A( ky ooo ky skyps---skqy*p )= 


This transformation makes equal the values M [Met wee th XA] and R. 


7.5.5 
Neural Network with N* Output Channels - Neural Network Solution Continuum 


In this case, vector x= e- P(x) is the vector of dimensionality N*. The distribution 
density of the sum of squares of the vector x, components has the form 


N* N-1 , 
fg(eq)= J 2 7 f . f f [x.y Pe] fle Gey fan 1d de 


tT id ! 
xy =P (xf. Fo Pyne pe swe dp. oxy) =P) 
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The distribution of the square of the transformed discrete error is 


dz! 12 
iell-sgle ele 


Consequently, the first moment of this distribution is 


ae J Zp If 2 (xp dg = flzep| ff ff flex vP'Ole] 
=e —00 E x’! 
fel€) a dxy_)..-dx, dedx, 


After the transformation of variables, 


ae ae 
ae=f of ff [Sloat | polercenae 
E ».¢ 


P=] 


The comparison of this expression with the expression for the average risk function 
in the case of the neural networks with N* output channels and a continuum of solu- 
tions gives the following form for the equation of discrete error transformation func- 
tion: 


SL 


N* 
| Sie rata =I[P(x),€] (7.34) 


7.6 
Neural Network in the Self-Learning Mode and Arbitrary 
Teacher Qualification 


The expression for the average risk function in the self-learning mode in the case of 
K, solutions has the form 


It can be shown that in the case of the system with K, solutions, the transformation 
of the output signal y forming the signal y’ with the first moment of its distribution 
equal to R is 


y' = p|x-b(y)| (7.35) 
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and in the case of the arbitrary teacher qualification, 
y!=I(y,2)b-+(1-b?) pfx —b(y)) (7.36) 


The Eqs. (7.35) and (7.36) are also valid in the case of the neural network with a 
continuum of solutions. 
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Chapter 8 


Development of Multivariable Function Extremum 
Search Algorithms 


8.1 
Procedure of the Secondary Optimization Functional Extremum 
Search in Multilayer Neural Networks 


The secondary optimization functional extremum search is performed in the present study 
with the help of iteration methods using a gradient search procedure for the local extre- 
mum. The problems of stability and convergence of the gradient procedures and the possi- 
bility to accelerate the extremum search are considered. The constraints of the equation and 
inequality types are considered in the case of multilayer neural network implementation. 

The multivariable function extremum search methods develop mainly in two lines. 
The first one includes the design of standard search programs. The function properties 
in this case are taken in the sufficiently detailed form. Method convergence and precision 
in the stationary state is usually analyzed without investigation of transient processes 
dynamics. 

The second line includes the design of adaptive system adjustment algorithms. The 
function is given in the most general form due to the problem specificity and functioning 
in conditions of poor a priori information about the input signal properties [8-1 to 8-29]. 

a multilayer neural network is a particular case of the adaptive system. The peculiar 
properties of the adaptive system design relate to the fact that even in the case of fixed 
structure of the open-loop neural network, it is impossible to know anything about the 
form of the secondary optimization functional. One can only know that it has some 
local extrema to be found in the process of closed-cycle adjustment. The adjustment 
circuit optimization problem for the multilayer neural network cannot generally be 
solved in the stage of optimization functional extremum search. 

That is why the main content of Chap. 12 is the adjustment circuit optimization in 
the investigation of closed-loop systems with quality estimation using the current value 
of the primary optimization functional. 


8.2 
Analysis of the Iteration Method for the Multivariable Function 
Extremum Search 


a general expression for the calculation of the system state vector in the case of func- 
tion Y(a) extremum search at the time n + 1 by the state vector at the time n has the 
form (for the unit search system memory) 


a(n+1)=a(n)+K~ oe) (8.1) 


da a=a(n) 
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Here Y(a) is the secondary optimization functional; a(m) is the system state vector 
(the current argument value for the extremum function); K’ is the [N° x N°] matrix of 
coefficients; and N° is the dimensionality of vector a. 

Selection of the coefficient of matrix K° determines the rate of convergence and 
quality of the iteration method. 

Procedure (8.1) describes the known search methods: scan-out, steepest descent, 
gradient, Gausse-Zeydel, Rosenbrock, Pawell, Sawswell, etc. 

The main problem is the selection of constraints upon the matrix K’ parameters 
providing the necessary quality for the function extremum search system. Let us con- 
sider a particular form of the neural network quality function: 


Y(a)=a"A-a+B’a+C (8.1a) 


Here a is the matrix of coefficients of the functional Y(a); B is the vector of coeffi- 
cients; and Cis the coefficient. 


Consequently, 
2 
oS) pate, "Mok s 49e4.. Ne (8.2) 
da daj0a; 


Expressions (8.1) and (8.2) give a recurrent expression for the system state vector 
at step n+ 1 through the state vector at step n in the following form: 


a(n + 1) =a(n) + K [2Ax(n) + B] 
or 
a(n+1)=K* B+|Y¥+2K"A]a(n) (8.3) 
Here Y is the unit matrix. 
Let us determine the coefficients of matrix K’ providing the iteration procedure 


convergence by one step starting from any initial state. The a(1) value providing the 
extremum is determined in the following way: 


1-1 
a(l)=——A B 
2 
Putting it into (8.3), one gets the following optimal matrix K’: 
opt — 


ey 
2 


The system that provides transition to (n + 1)-th step by the results obtained at 
the n-th step is called stable if the function value at the (n + 1)-th step is less than that 
at the n-th step. In the opposite case this system is called self-oscillating or an un- 
stable one: 
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i): ave K"B+|¥ +2K*Ala(n) 


= 
2) a! (n)Aa(n)+Bia(n)=!ta! (n+1)-A-a(n+1)+ Bi a(n+)) (8.4) 
< 


The solution of this system of equations requires the use of a computer. The search 
system with particular forms of matrix K" (particular methods of extremum search) 
is stable if matrix K" satisfies (8.4). 

Let us obtain the non-recurrent expression for a(1). It follows from (8.3): 

a(1)=K B+ [Y+2AK ]a(0); 

a(2) =K B+[Y+2AK ]KB+[Y+2AK ]?a(0); 

a(3) =K B+[Y+2AK ]KB+[Y+2AK |?K B+ [Y+2K a]°a(0); 


By induction 


a(n) = 


* + \n-l| oy x Wn 
¥+(¥+2K A) +u+(¥ +2K A) K B+|¥+2K A| a(0) 
Taking into account that 


. WW 
. yet Y-[¥+2K"A) 
¥+(¥+2K"A}+...+(¥ +2K"A] rae ees 


y—|¥+2K"A] 


= y—(y+2K"A)" (2K"a) 


one can write the expression for a(n) in the following form: 


(KA) ! KB 


a(n) =(Y +2K"A)’ a(0) 4 ; 


ein 
(y+2K"4) 4 


Then the final non-recurrent expression for the search system state vector is 


a(n)=(¥ +2K" A)" a(0) 4 (y 2K*A)" y|AB (8.5) 


1 
2 
Putting the condition of optimum for the operation speed of the search system 
K=-0.5A"1, one obtains the expected result: 
1-1 
ss B , n=1,2,..., 


that corresponds to the extremum function value. 
The analysis of the non-recurrent expression gives the constraints upon the matrix 
K" parameters providing the search iteration procedure convergence. 
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It follows from (8.5) that 


lim a(n)= A's 


n—-oo 


i.e., it does not depend on a(0) and equals the state vector extremum value if 


» \n 

lim (Y+2K"A) =O 

n—-oo 

where O is the zero matrix (Y, a). This expression can be used for the corroboration of 
the search system convergence. The expression for K" matrix providing the self-oscil- 
lation mode of the search procedure is given in [8-12]. 


8.3 
About the Stochastic Approximation Method 


The stochastic approximation method is realized by the search system that is similar 
to the gradient one. However, the system parameters (matrix K’) in this case are not 
fixed [8-1, 8-3, 8-6, 8-8]. The stochastic approximation method is used in the case of 
random erroneous measurements of minimized function gradient vector. The pres- 
ence of random errors makes it necessary to introduce variability for the search sys- 
tem parameters in order to provide zero random error for the determination of the ex- 
treme point. The disadvantages of this method consisting in the increase of systematic 
errors in the transient process of the extreme point search are mentioned in [8-29]. 

The present study deals with the neural network synthesis technique where the 
stochastic approximation method can be combined with some other search methods 
with fixed parameters. The closed-loop neural network design is performed under the 
condition of some initial indeterminateness of matrix K’ that is eliminated only in the 
stage of analysis of the closed-loop system. The question about an optimal selection of 
matrix K” parameters will be ill-posed in this case because the form of the minimized 
function is not known in advance. 


8.4 
Iteration Methods for Multivariable Function Extremum Search 
in the Case of Equality-Type Constraints upon Variables 


The equality-type constraints upon the adjustable neural network coefficients can be 
written in the general in the form 


qula)=0 , M=1,....M, , M,<N°41 

In the real neural networks, 

N 

S04; =a=const. (8.6) 
i=0 


ie., there are, for example, constraints upon the sum of the coefficients. 
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8.4.1 
Search Algorithm 


The problem of quality function Y(a) minimization in this case is solved in the mul- 
tilayer neural networks by the construction of the Lagrange function 


Y(a, A) = Y(a) + A7q"(a) 


where A7=[A,,...,,,,] is the vector of Langrangian multipliers; and q?(a) = [q,(a),..., 
qu, (a)] is the vector function of constraints. 

The solution of the minimization problem is reduced to the solution of the follow- 
ing system of equations: 


dY(a,a)__ dY(a) _, , dY(a,A)_ -_ 
a ae Te a0 (8.7) 
Here 
Aq (a) 94m, (a) 
O(a) = dq (a) = 0a; Oag 
0a; Og, (a) 04m, (@) 
Oay a Oay 


The recurrent relationship for the search algorithm follows from (8.7): 


_ x OY(a,A) a. OY(a,A) 
a(n+1)=a(n)+K,,(n) rn = + Kaj (n) AA. . eat) 
A=A(n A=A(n) 
+ OY(a,A) * OY(a,A) 
])= + K + K 
Mn+1)=n)+K gq(n) Ba ia yan) Dh. 


a= a=a(n) 
A=A(n) V=A(n) 

The search system in this case can be represented by the equivalent discrete system 
with parameter matrices K,,, Kj, Kj, Kj,- Taking into account (8.7), the final expres- 
sion for the search algorithm can be written in the form 


a(n +1)=a(n)+Kgq(n) TO QerAl oy +Kqa(n)q(@)|a—a(n) 
A=A(n) 
Mn-+1)=Mn) +K%q(n)|2 Oe oy tEAM con 


A=A(n) 
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In this case of constraints (8.6), 


no 
apa) =a eK 2G) wie aeayny tH Al) + Kean) a5()— a 
i=0 
d¥(a) x 
A(n+1)=A(n)4 Kal) = |a-a(n) +1-Ay(m)| +K gq (n)| > Jaj(n)— @ 
i=0 


where 1 is the column vector of dimensionality Nj + 1 consisting of 1. 


8.4.2 
Analysis of the Matrix of the Second Derivatives of the Lagrange Function 


If Y(a) is determined by (8.1a) and the following designation is introduced: y'= 
[495+ +sAr oA p--oAng] then 


07Y(a,A) 
Oy; OY; 
I> i=N°+1,....N°+M, ; j=0,...,N° 


a). pepe con ¢ j=0,...,N° 
I|IV 


Il—i=0,....N° ; j=N°+1,....N°+M, 
IV>i=N°+4+1,...,.N°+M;;3j=N°+M, 
ij=0,...N° , N-+1,....N°+M, 

It is clear that [I] = 2A, [III] = [I1]7= Q(a), [IV] =0. 


Consequently, the matrix of the second derivatives of the Lagrange function has the 
following form: 


2 
0°Y(a,A) - Ze Q i,j=0,...,N°,N® +1,...,N°+M, 
dy; Ay; | [QV 10 

8.4.3 


Operation Speed Optimization for the Extremum Search Iteration Procedure in 
the Case of Equality-Type Constraints 


Using the Newton method for minimization of the Lagrange function, one obtains the 
following conditions for the optimal operation speed: 


ay(y)| 


Oy; OY; 


Kaa (n) 
Ky) 


Kaa (n) 


K (n)= 
e Kj, (n) 


i,j=0,....N°,N° +1,....N°+My 


y=y(n) 
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It can be shown that the existence condition for the matrix, inverse to the matrix of 
the input derivatives of the Lagrange function, is the condition of equality of M, to the 
rank of matrix Q. Consequently, 


, i,j=0,....N°,N°+1,....N°+My 


where H = -Q'A,"!Q, A, = 2A. The expressions for matrices K,,(n), K,,(1), Kj,,(n), and 
K},(n), providing the optimal operation speed for the search procedure, can be de- 
rived from the above equations. 


8.4.4 
Optimal Operation Speed under Constraints (8.6) 


In this case K,,(n) = -Aj[I +L], where I is a unit matrix of dimensionality, 
[(N°+ 1) x (N°+ 1)]; L = QH''QAj' = Q(-Q7A;'Q)1Q74j}. 


In the particular case of QT=([l,...,1], 


H-1 1 ce 1 
~*=——— , where a=) > 4 ; Ay =|ai,| 
OA i=l j=l 
a 1 |% = %o no 
L=-—QQ' A, '=-— : a =)yoa,y fa0..4N° 
OA O4|Q% @0 i—0 


It must be mentioned that at any a priori information, matrix K,, differs from matrix 
(—A;}), and it is non-diagonal even if matrix a is diagonal. Matrix K,, has the following form: 
Kjg=H} =—— 
OA 
ive., under one constraint, the optimal Kj, is determined only by the sum of elements 
of matrix Aj! across matrix rows and columns. In this case, 


1 
On 


1 | 


OA 


* —1 
Kya = Ar 'QH' = Ay '0(-Q" 4; 1Q) =A 


a 
wo 
Matrix K,, is not a zero matrix under any a priori information about matrix a, ie., 
the cross-connections are present in the search algorithm. 


8.4.5 
The Case of Constraints of Equality Type That Can Be Solved 


When constraints have the linear form 


Qi --- Qo 
Qa=a , where Qi = wh he 48 (8.8) 
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then one can solve the system of equations for M, variables and express coefficients 
Ay «++» Ay -1 through remained (N) + 1 - M,) variables. Matrix Q in this case is divided 
into two blocks: 


Qomy, ---Quymy, Qa +1yM1---Qy 04, 


Then constraints (8.8) take the following form: 


Qa + Qha =a 


where 
a? Lag sdyg,a) » a la. 0| 
a= (a") ( a—ghal)) (8.9) 


This expression is substituted into Y(a) and the extremum of the resultant function of 
(Ny) + 1-M,) variables is found by the previously described method. Optimal values of 
(Ny) + 1-M,) variables are determined. Then optimal values of M, variables are deter- 
mined according to (8.9). Taking into account (8.6), the expression (8.9) takes the form 


8.4.6 
Iteration Process Stability under Equality-Type Constraints 


We shall consider the search process to be stable as the Lagrange function value de- 
creases at each step, i.e., 


Yly(n)] < Yly(1- 1)] (8.10) 


Transforming Y(y) into the Taylor series in the neighborhood of point y(n-1) and 
throwing away the terms of higher than the second order, one obtains 


AT dY(y) 


Y[y(a—1)+A]=Y|y(n—1)]|+ ay 


y=y(n-1) 


Here A is an incremental vector of variables. Taking into account (8.10), one gets the 
following condition of stability: 


ar ayy) ar a¥'y) 


A<0 8.11 
dy (8.11) 


y=y(n—-1) 
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The iteration procedure gives the following increment at each step: 


* v4 
A=K ny) (8.12) 
Y ly=y(n—1) 
Putting (8.12) into (8.11), one gets after some transformations 
il 2 
eu KT (n1)4+K"T (n—1)$ este K’(n-1) a <0 
Y ly=y(n-) 2dy y=y(n—l) Y ly=y(n-) 


Consequently, the sufficient stability condition is the negative definiteness of the 
following matrix: 


d’Y(y) 
2dy? 


K’(n-l) 
y=y(n-1) 


G= K 7 w—+K '(n=i) 


This matrix determines the relationship between Lagrange function parameters and 
parameters of matrix K" of the search system. 


8.4.7 
Convergence of the Iteration Search Method under the Equality-Type Constraints 


Let us consider the search process convergence in the case of quadratic function. In 
this case, Y(a7A) = a’Aa + Bla +C+A!Qa: 


dY(a,A) | 
d(a,A) 


2Aa+B! +Q7A 
Qa 


(8.13) 


This expression can be rewritten in the form 


Br 
0 


a 


A 


2A Qt 
Q 0 


dY(a,A) _ 
d(a,A) 


; 


In this case, 


y(n)=y(n+1)+K (n—1)|Ay(n—1) +8] 


As it was done above, the following non-recurrent expression can be written for the 
generalized state variable 


Y(n)= lr +K*A)’ yio)+|(r-+K"a] vate 
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The substitution of Y(n) = A~!B into (8.13) shows that gradient vector of the Lagrange 
function becomes zero, i.e., the point Y() = A~'B is the extremum point. It is sufficient 
to prove that 


; nara 
lim lv +K Al =0 
n—0o 
for the convergence of the iteration procedure. 

In the case of matrix [Y + K’A] nonsingularity, the last expression is equivalent to 
the following one: 


Detly+K*A| Ai 


8.5 
Iteration Extremum Search Methods for Multivariable Functions 
under Inequality-Type Constraints 


Such constraints for the neural network emerge in particular due to the limits for the 
variance of adjustable coefficients and have the form Gy (a) <0 (u=1,...,M,). 

The following constraints of this particular type usually emerge for the neural net- 
work: 


qj — Amax < 1 


8.13 
—a; <0 (8.13a) 


4min 
In the particular case of the neural network design based on the real physical ele- 
ments, the following conditions are possible: 


8.5.1 
Conditions of Optimality 


Conditions of optimality in this case are given by the Kune-Tacker theorem. It repre- 
sents the Lagrange method generalization in the case of inequality-type constraints. 
According to this theorem, the optimal vector a corresponding to the minimum of 
the convex functional is the solution of the following system of equations and in- 
equalities: 


dY(a,/) = dY(a) 


A=0 
da da Qa) 


q(a)+6=0 , ASO 


5-0 , 8>0 ow 
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The expression for matrix Q is preserved with exchange of M, by M,. In the expres- 
sion (8.14), 


A= AAasAgs.-Aay | > 5 =|81,8),33,...5m, 


Inequalities § = 0 and A = 0 mean that all components of these vectors are non-nega- 
tive. It is also assumed that there exists such a vector a that satisfies the inequality 
Qu (a) <0. Conditions (8.14) have the following physical sense. If some constraint is not 
significant for the optimal vector a,,,, i.€., q,, (ao) < 0 for some 1, then corresponding 
Ay= 0. If Xu > 0, then it follows from (8.14) that b= Fy (gp) =0 

Thus, Langrangian multipliers can be regarded as some estimations of the constraint’s 
influence upon the optimal value of the adjustable coefficient vector. If functions Y(a) 
and q,,(a) (U=1,...,M)) are convex, then the Kune-Tacker theorem gives necessary 
and sufficient optimality conditions. 


8.5.2 
Algorithm of Extremum Search in the Case of Inequality-Type Constraints 


The optimality conditions (8.14) give the following system of relationships for the it- 
eration extremum search procedure in the case of inequality-type constraints: 


dY(a,a »  dY(aa 
a(n+1)=a(n)-+ K7(n) & (ys 
dA ja=a(n) 
=A(n) 
Y > 
Mn+1)=max}0,A(n)+|Ky_(n mS Cao 
2.(0)>0 da rene 
Finally one obtains 
+ dY(a) * 
a=(n+1)=a(n)+Kyq()|— = +Q@)a]_— + Kaalmata)| 
da ) a=a(n) 
he A(n) 
A(n+1)=max{0,A(n)+Kq(n) oo ra) -. + Kyat) = 
h=M(n) - 
In the particular case of constraints (8.13a), q,(a) =a - aya, < 0; q,(a) = a,j, - aS 0; 
q(a)= ~ @max 
max @ 


100..0!-1 0 0...0 
Q(a)=|0 1 0...01 0 -1 0...0 
00 0...1;0 0 0...1 
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8.6 
Algorithm of Random Search of Local and Global Extrema for 
Multivariable Functions 


The only reason for introducing randomness into the procedure of the neural network 
secondary optimization functional extremum search is the multi-modality of the in- 
put signal distribution. This property results in multi-extremum characteristics of the 
quality function of the open-loop neural network with the given structure. The ran- 
dom search methods are described in details in [8-2, 8-16]. 

The problem is to find local minimums of the multi-extremum functional of the 
neural network error and selection of the global minimum. Let us describe one cycle 
of the random search algorithm: 


a The vector value of the function variables is randomly selected. This vector must be 
located in the region of some local extremum; 

b This extremum must be found by any of the non-random search methods given 
above; 

c The value of the found extremum and corresponding vector of variables are com- 
pared with those stored in the memory. If this extremum is absent in the memory, 
then it is stored too; 

d The transfer to the step “a” is performed thereafter. 


The results of experimental investigation of this algorithm are given in Chap. 12. 
The convergence rate can be in principle increased by eliminating already found re- 
gions from the random region set for the initial vector. 

Let us analyze the convergence of this random search algorithm with respect to the 
number of extrema. Let i modes be already found (0 <i< U). The probability of occur- 
rence in the region of these modes under their uniform distribution is i/U. The distri- 
bution of the random value ; representing the number of steps from the search of the 
i-th mode to the search of the (i+ 1)-th mode amounts to € ;= k with the probability 


k-1 


1 


U 


i 


7 (8.15) 


The random search procedure is performed independently at each step. Let us in- 
troduce a new random value 


j-l 
nj=>.§0<j<v) 


i=0 


representing the number of steps up to the search of j modes out of U. Independent events 
Gia ky. Ox1= hyp where 1Sk<st+1,..,1Sk_)Sst+landk+kt+...+k_j=stj-1 
give in their combination such an event that 7 ;=¢ 9+... + ¢;,=5+j. Hence, ¢)= 1 with 
the unit probability. Due to the independence of ¢ ,, the probability of such an event is 


P(E, = ky)..-P(E; 4 = k-1) 
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According to the formula of total probability, 


Pnj=st= YP PK). PE a= Ka) 
ky +...+kj=s+j-1 


j-1 
=yulss (U—1)! a “kj 
7 (U—j)! 

I i+. +i =s(kp 20) #=1 


ky>l (8.16) 


In the particular case of j = U, 


U-l 
P(qy=stuj=u'-su_-y SST® 
K{+...+k_y=s i=] 
(kp 2r=1,...,U-1) 
where P(1] y=s + U) is the probability that U modes will be found through s + U steps 
of the random search procedure. It can be shown that the average value and variance 
of the number of steps required for the search of U modes are 


U-1 
Mny =1+U)> ae U[In(U —1)+0.577] 
aia 
I (8.16a) 


U-1 _ 
Dn = )> oo ”) 20? —u[ln(U —1)+.0.577... 


r=1 

The analysis of these expressions shows a sufficient convergence rate of this search 

procedure. The procedure can be generalized for the case when the region of the al- 

ready found mode is excluded from the random search as it was mentioned above. The 
above expressions are valid in the multidimensional case. 


8.7 

Development of the Neural Network Adaptation Algorithms with the 
Use of Estimations of the Second Order Derivatives of the Secondary 
Optimization Functional 


Let us consider the development of the neural network adaptation algorithms with the 
use of simultaneous estimations of the first and second order derivatives of the sec- 
ondary optimization functional. 


8.7.1 
Development of Search Algorithms 


Let us consider the problem of extremum search in the form of an equivalent problem 
of the root search for the following system of equations: 

D, (Aynaty =O 

gases taivdebanehoers (8.17) 
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When considering the implicitly given system 


N—D i (x4,..0%y )=0 


ee (8.18) 
YN —Dyn (xy 5-5 Xyy) =0 
that satisfies the Yung theorem, there exist such 
X41 (Yo In) =A ¥w) 
siesuicanisedinagewiceaseaseees (8.19) 


xy (Yo In) = Fw My--Yn) 


that their substitution into (8.18) provides identity. Let us transform functions 
F\(Vs +0) Vues Fy(Vp -++s Vy) into the Taylor series with two terms: 


oe 
R(0)=A(y)— DA | ae "ys + Rs 
i=1 j=l 


eau ease aseticn te eccutiadscaboteaera erate tee dice aetna a (8.20) 
ee 
Fy(0)= Fy(y)— ae a S WY) yy, Ray 


! 
Pere 


Differentiation of (8.18) with the use of (8.19) results in the following system of 


equations: 
ye Ox; _ OD, Ox; -1 
ia OX; OVE i= OX; OV 
ise dv cn vaeaceeuine eae seas (8.21) 
ODy Ox; _ 
<1 Ox; OVE 7 


OR OF | [OD, OD, 
By | eee 


OFy aFy| Dy Dy 
ay, “Ayn Ox, ~ Oxy 


After differentiation of (8.21), one obtains 


> a’x, ap,  & > OD, OX; Ox, _ 


it VOM, OX; Fy jy OX{OX; OY) OY 
5g Oy, S55 O Dy Oj OH 


Ox;Ox; Oy, Oy, 


=0 


i=1 Oy, OY] Ox; i=1 j=l 
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After multiplying of both sides of the equations by y,y, and summation over k and J, 
one obtains 


ed y=) 


i kel [ad OV KON i=1 j=l 


67D, eh. Oh Onj0x 


Vk = 
Ox;OX; i aay, Oy, OY 


Y. OD 6x 
yay ee any 


i=l 1 k=11=1 i=l j=l 


OP Dig os ORR 


J 
Wk =Cn 
OxjOX; 2 dX OyjOV5 


This system can be rewritten in the form 


A ox aD, aD, |" 
Oa ae a a 
k=1 l=1 Vk YI Ox, OxN 
eee eer eeese =——_ re Cc 
k}1 a7 
== “By, 85 Ox, Oxy 
Taking into account that the vector 
a@| | K(0) 
a= = 
On| |Fy (0) 
is the root of the system (8.17), 
OD, OD, xX] 
es a 
—1_ xy XN —_ : ¥_ = Dy (x) 
ODy ODy 
Ox, _ Oxy XN 


; i ee 
C 2 andy. 2424 ay, das 


cy] [Soyo Pw Say 


i=1 j= 1 OxjOX j k=1 I=1 Oy, OY 


It follows from (8.20) 


_ Tis 
a=x—-W ‘D+ W Ie 
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Consequently, the general expression for the search algorithm of the multivariable 
function extremum with the second order derivatives matrix is 


x(n-+1)=x(n) —W [x(n Dlxn)] + Wx(n)]Clx(n)] (8.22) 


8.7.2 
One-Dimensional Case 


In this case, D(x) = 0; F(y) = D=1(x); x = F[D(x)], (xe [a, b]); y =D[F(y)]. If @ is the 
root of the equation, then @= F(0). Let us transform F(y) into the series 


r 


k 
F(0)—F(y) = (k= At AR 
k=1 : 


or in another form, 


Tr (k) 
wat pk EON DEM + Rea 


The differentiation of the initial equations gives 
F'[D(x)|D'(x)=1 

F"|D(x)|D'?(x) + F'[D(x)]D"(x) =0 
F"[D(x)|D?(x) +3F" [D(x)|D'(x)D"(x) + F’[D(x)|D""(x) =0 
One finally obtains in the case of r=2 


D'(xp) B74) m 


x(n+1)= x(n) D"(c,) D(x,) (x,) (8.23) 


where D”’(x,,) = K (n). 
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Chapter 9 


Neural Network Adjustment Algorithms 


9.1 
Problem Statement 


It was mentioned in Chap. 7 that secondary optimization functional selection is per- 
formed on the basis of the given general input signal characteristics, primary optimi- 
zation criterion and open-loop neural network structure. 

The closed-loop neural network represents an open-loop neural network with in- 
cluded adjustment unit. Development of the closed-loop neural network is performed 
on the basis of a selected secondary optimization criterion and on the method of ex- 
tremum search of the given functional [9-1 to 9-4]. Synthesis of the unit for the calcu- 
lation of parameters of the neural network quality functional required for the iteration 
search organization is performed. The main problem in this case consists in the esti- 
mation of the secondary optimization functional gradient vector. There are two pos- 
sibilities to solve this problem. It can be done by the search procedure on the basis of 
analysis of search oscillation results or by the determination of gradient vector estima- 
tion in the form of analytical expression. 

In the first case, the search neural network is used, and in the second one - the 
analytical neural network is used. The first case is preferable but not always possible. 
If the system does not provide the possibility to select the signal characterizing the 
optimization functional gradient, then one must use the search oscillation methods. 

We consider below recognition systems of different types: a neuron with two solu- 
tions for two pattern classes, a neuron with K, solutions for K pattern classes, a neuron 
with a solution continuum and continuum of pattern classes, multilayer neural net- 
works consisting of neurons with a solution continuum and the existence or absence 
of constraints upon the adjustable coefficients, multilayer neural networks with 
N’-dimensional signals ¢(m) and y(n), and multilayer neural networks with cross and 
backward connections. 

Methods of neural network design can be easily generalized for the case of transient 
patterns when the implementation of the gradient vector is the realization of a non- 
stationary multidimensional random process. This gradient property determines the 
method of multidimensional filter design in the neural network adjustment unit. 

A special attention requires the problem of multilayer neural network design in the 
learning mode and in the mode with arbitrary teacher qualification. Methods of closed- 
loop neural network design in this case are the same as in the learning mode. 

Design of a neural network adjustment algorithm with a closed cycle is performed 
by putting the expression for the estimation of the functional gradient vector into the 
corresponding expression for the search procedure. 
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9.2 
Neuron with Two-Solution Continuums 


Four secondary optimization functionals |@,,|; | ,[, |@,|, and |a,,| are considered 
below for the neuron with two solutions. The expression for the estimation of the 
absolute value for the first moment of the analogous error has the form 


N 
xq(n) "| =|e(n)"" —Svajx;(n)""| , (xp =-1) 
i=0 
Consequently, 
A\xa(n)” os 
=—sign|x,(n) "|x;(n) " , i=0,...,N 
0a; 


The recurrent expression representing a basis for the design of a neuron that is 
adjustable through the closed cycle has the following form: 


a(n+1)=a(n)—K sign xq(n)™ x(n)” (9.1) 


The choice of parameters of matrix K" is the final goal of the analysis and synthesis 
of closed-loop neural networks. Some constraints can be imposed upon this matrix 
even at the given stage. In [9-5, 9-6], these constraints are determined for the stochastic 
approximation method. One can require the convergence of the iteration procedure to 
its extremum |@,,| at each step, i.e., the validity of the following conditions (in a one- 
dimensional case): 


e(n) ” —x(n)""a,(n+1)+a9(n +1) =0 


One of the possible sets of matrix K" elements is 


* mn * * * 
Ky, =—|xg(n) “| » Ko =Ki2=Ky =0 


The choice of m,, in the expression 


xy" = YS x, fi) 


1 i=n—My 


is also a problem of analysis and synthesis of the closed-loop neural networks. The 
increase of m, results in the increase of the noise level for the measurement of the 
secondary optimization functional gradient. But it decreases the delay in the closed- 
cycle adjustment circuit. 


9.2 - Neuron with Two-Solution Continuums 
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In the case of minimization of the second order moment of the analogous error, 


thin N Mn. N ™y 
x?(n) =1+ SS a;x;(n) —2e(n)) \a;x;(n) 
i=0 i=0 
Consequently, 
Mm. 
Ax? (n) 7 


=-2 x,(n)-x;(n)” » 1=0,...,N 
Oa; 


a(n-+1)=a(n)—2K" x,(n)x(n)” 
In the case of minimization of the first order moment of the neural network discrete error, 


™y 


— — N 

x,(n)” | = e(n)” —sign S a;x;(n) 
i=0 

0 


——m 
x_(n) My 


ign|xg(n)| " —2-sign g(n) 
=-—sign|x,(n — sign e(n 
Oa; 6 8g Oa; 6 8 
For the extremum search, one can use the information about signs and values of the 
first and second derivatives, etc. 
The value of the first derivative in this case cannot be determined, and one must use 
the information about its sign. Taking into account that 


N N 
© sign} *ajxj(n)= lim 2 8 srctg BS \ajx,(n)= lim Se 
0a; i—0 Boo T 0a; i=0 Bool 1+ Bg (n) 
then 
pilO 22: seattle ; : 
sign|—— sign g(n)|sign|—x;(n) lim ——, —|= sign x;(n) 
0a; 14 Boo 1+ B“g*(n) 
Consequently, 
O|x,(n) 
Bee toheageeiG) s 1=0n.gN (9.2) 
0a; 


Here and below the latter expression is also conditionally termed as an estimation 
of a gradient vector, though in principle this is a pseudo-gradient resulting from the 
exchange of the derivative dy/da; by the sign of the derivative. 

In this case, there is no possibility to create the closed-cycle adjustment algorithm 
satisfying criterion of minimum |@,,| under the arbitrary value of memory m,, for the 
gradient estimation filter. To show this, let us present the measured values of the second- 
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ary optimization functional gradient in the form of some random process. In the general 
case, including the criterion of minimum @,,, the measured gradient at the current step 
can be conditionally represented in the form of the product of two multipliers x,()x,(n). 
One of the multipliers, for example d[sign g(n)]/da; can not be calculated directly through 
the neural network signals. Only its sign can be determined in this way. The exchange in 
the expression for the gradient under the arbitrary value of m,, results in the loss of the 
possibility to determine the sign of gradient estimation because in the general case, 


sign x (n)x,(n)” ~ sign x,(n)sign x(n)” 


Consequently, the design of analytical adjustment algorithms for the neural network 
with two solutions and closed-cycle secondary optimization functional for discrete error 
can be performed only if m,,= 1. Otherwise, if m,, > 1, then the search adjustment proce- 
dure must be developed. In any case, the adjustment procedure for the estimation of dy/ 
da in the expression of the secondary optimization gradient must be introduced. 

Expression (9.2) represents a basis for implementation of a corresponding closed- 
loop neural network. In the case of the neural network with @,, minimization, 


2 14 
Ox; (n) aon PuNainen tant 

g oa A nN 
OE =—2x,(n)signx;(n) 


It is evident that the adjustment algorithm with minimization criteria of |a,| and 
05, are the same if m,,= 1. 
In the case of a neuron with a solution continuum, 


N 
S a;x;(n) 


i=0 


y(n) = Flg(n)|=F 


N 
> a;x;(n) 


i=0 


Xg(n)=e(n)F 


In the case of minimization of |a,| and a, respectively, 


—*s 
O)x_(n) 25a a an) ca,” 
Oa; aie dg ' 

dx? (n) dF(g)"” 
oe = —2x,(n) 4 g x;(n) 


Recurrent algorithms for the development of the closed-loop neural network in the 
considered cases are 


a(n+1)=a(n)+K™ sign|x x(n)” x(n :" 


dF(g) 
dg 


3 - Two-Layer Neural Networks 
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a(n-+1)=a(n)-+2K7 x,(n) So) oe a 
In the particular case of F(g) = 


a(n+1)=a(n)+K™ sign|x n|28 


¢ ela)” 


a(n+1)=a(n yx AB AG)” 
T™ 14B*g*(n) 


9.3 
Two-Layer Neural Networks 


2/m arctg Bg 


x(n) 
“Th 14Bg"(n) 


Let us consider a two-layer neural network with the neurons with full connections 


adjustable by a closed cycle. In this case, 
Ay 
y(n) = Flg(n)|=F|) \ajy(n)|=F 
j=0 j=0 
Here 


x(n) = e(n)— g(n); 


Ay 
> 2 4)F[gi(n)|(n) =F 


X_(n)=e(n)— y(n) 


Ay 
>i ajF 
j=0 


N 
yo ayx;(n) 
i=0 


Table 9.1. Expressions for the adjustable coefficients of the neuron in the first and second layers 


Secondary optimi- 0(-) 
zation functional 4g 


feat” = —signtxa(n)"” y(ny"” 
Og en)” = 2Xalmdytny™” 
= ow 
» xg(n] -signtxg nt"? 229) y(n) 
dg 
=m Mn 
|||, xg(n) 2Xq( ota) (n) 
dg 


0 
Oay 


— dF 
—signtx(nl”” a; 5 x;(n) 


— Fig)” 
=signix,(nl” axa 
j 
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The main problem is the derivation of the expressions for the secondary optimi- 
zation functional gradient through the output and intermediate neural network sig- 
nals. Table 9.1 shows these expressions for the adjustable coefficients of the neuron in 
the first and second layers. 

Tables 9.2 and 9.3 show expressions for the secondary optimization functional gra- 
dients in the case of F(g) = sign(g) and F(g) = (2/n)arctg Bg. 

Let us mention some features of the learning methods of the two-layer neural net- 
work with one layer of nonlinear random connections. Rosenblatt called such a neu- 
ral network a three-layer perceptron. Its structure was described in Chap. 2. The sharp 
decrease of neuron inputs in the first layer and random connections of this layer with 


Table 9.2. Expressions for the secondary optimization functional gradients in the case of F(g) = sign(g) 


Secondary optimi- 2) OO) 

zationfunctional 99 day 

|a%q, x(n)” —sign[xq(n)] y(n" =a) sign(x,(n)]” signx;(n)"” 
Aq, ZO” —2xq(My(n)"” —2a;xq(n)signx;(n)"” 

|g) ixq(n) —sign|x,(n)] y(n) —sign| x4(n)|signa, signx;(n) 
oe x4(n) —2Xq(n)y(n) —2signajx,(n)signx;(n) 


Table 9.3. Expressions for the secondary optimization functional gradients in the case of 
F(g) = (2/m) arctg Bg 


Secondary optimi- 2) a) 
zation functional da ij dai; 
an so pt) Sea) By inn 
|@ia\, \xq(n) =sign|x,(n)| ” y(n) —signhegtaafa, 2 oS 
T\1—B*g*(n 
Og 1 x(n)” —2xq{n)y(n)” 2a; 2 Bxi(n)"” 
a eBay 
——mn — Aaa Mn 
aa) |xq(n) —sign|x in" 2 Byn) = ( 1 4 B?x;(n) 
g 1 [1+-6°9?(n) auch me 4 [1+8797(m) 14-6797 (n)] 
j 
im ai —>——m 
|@q), xG(0) ~2x4(n)2 ue a B°x;(n)xg(n) 
T/ 1+ Bg*(n) apd [1+ 82a?(n]f1+8292(n) 


9.4 - Multilayer Neural Networks with Solution Continuum Neurons 
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the input space of the neural network result in the requirement to increase the num- 
ber of neurons in the first layer. 
In this case, 


y(n) =F Yar a Xj, ji) 


ij=0 


The random connections are fixed at the stage of adjustment. Only connection 
coefficients must be adjustable. The adjustment algorithm for neurons of the first layer 
has the form (for minimum Oy criterion) 


Axe) oe) dF(gi) 


Xp (n)—— 
ij ns dg; 


ij) 


9.4 
Multilayer Neural Networks with Solution Continuum Neurons 


The neural network in this case has H; neurons in each j-th layer (j = 1,..., W). The 
expression for the output signal of such a neural network has the form (2.2). Let us 
find partial derivatives of y(n) and g(n) with respect to coefficients Ow jatwj 


Oy(n) 
Otay _ jay —j 
Hw_1 Hw—j4+3 j+2 jt dF | ghy. ¥ we 
= So ae ee” Le hn are ee Xny_ 5 M) (9.3) 
hy1=1 hy—j+3=1 1=0 v=0 ee v 
Og(n) 
By —jvhw—j 
= a ad > Tle Fy mhw— n- T Xiny 2 7 


hw41=1 hw j+3=1 7-0 v=1 ag v 
Tables 9.4 and 9.5 show expressions for the estimation of the secondary optimiza- 


tion functional gradients in the case of arbitrary F and F = sign(g). 
In this case, 


F(g)=sign(g) , signa? =* 


for all j # W, and it significantly simplifies the expressions for the gradients. 
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Table 9.4. Expressions for the estimation of the secondary optimization functional gradients in the 
case of arbitrary F 


a, 
|| nic” Ws Hy/—j43 if iH arg (n) an 
—signlx,(n a = re Wai 
ey ore ron ee iste a ee) 
2g ay 
: My Hy —j+3 iv {i dF lay, (n) W-i (ny 
= a X,(n x n 
hy1=1 hy—j43=1 7-0 OST Ta dh y ue 
\a%g| Hh Hi W—v q 
WI W—j+3  j+2 ju dF Iny (n) 
—signlx (n)"” Sas Ila eu WJ (n) 
9g Sl me = bw —ntW—n-\ 56 dof” hw—j 
g ; Me ms H nit dF gy,” (7) W-i (ny 
= a a xia) |] =x n 
hy=1 hy—j43=1 1-0 MTN dy y ee 


Table 9.5. Expressions for the estimation of the secondary optimization functional gradients in the 
case of F = sign(g) 


|g Mn 
my “WI | hea a 
—sign[xq(n)] XY, Bf Sal Shy pet —n ys 
hw! n=2 by—n=! y 
a 
H Hy jal Hw—n Wes ie 
2 Gi ay hades MO Gay pihw hy 
hy—1=1 =2 hy—n=! 
\ar,| ‘ oa wie Ww-j 
—sign[xg(n)|sign Il ay Oy nsw —nXhyj (n) 
m1 hy—n=! 
Ong int Yw—n Wj 
=D Aims MN 3S, Gry wale ey Al) 
g Wael hw —n+w—n hy-j 
9.5 


Design of Neural Networks with Closed Cycle Adjustment under 
Constraints upon Variables 


Let us consider the equality-type and inequality-type constraints upon the adjustable 
coefficients of multilayer neural networks that were represented in Chap. 8. Such neural 
networks are characterized by constraints upon the assemblage of coefficients of all 
neural networks, upon the assemblage of coefficients of each separate layer, and upon 
the assemblage of each separate neuron of the neural network. 


9.5 - Design of Neural Networks with Closed Cycle Adjustment under Constraints upon Variables 
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Consequently, one obtains for a two-layer neural network 


Ay N 

‘3 ajt+) ay =a (9.5a) 
j=0 i=0 

Ay N Ay 

ay =H 3 324; = (9.5b) 
j=0i=0 j=0 

N Hy 

aj -9;=0 5 Dlaj—@=0 ; j=0,..,.Hy (9.5c) 
i=0 j=0 


The inequality-type constraints have the form close to that represented in Sect. 8.5. 


Neural network in the form of a neuron. In the case of the |@,| minimization criterion 
and constraints (9.5a), the system (9.1) has the following form: 


—sign[x,(n)]"" x(n)” +1A(n) 


a(n+1) * ig 
SS a;(n)—a@ 
j=0 


A(n+1) 


a(n) 
An) 


In the case of the || minimization criterion and inequality-type constraints 
(Chap. 8) upon the adjustable coefficients, the recurrent relationship for the closed- 
loop neural network design has the form 


Ay— Anat a(n)—aax 
a(n+1)=a(n)-+ Kyq(n) sign|x,(n)|signx(n)+ . +K*j(n) 
Ay —Aa(N-+1) amin —a(1) 
Ay — Any 
0, A(n) + K 4q(n) —sign|x, (n)|sign x(n) + 
A(n+1)= max Ay — Ayn +1) 
a(n)—amax 
+K y(n) 
Amin —a(n) 


Two-layer neural network. Let us consider the case of constraints upon the coefficients 
of the multilayer neural network. The basic recurrent expressions for the closed-loop 
neural network design are given below. 
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In the case of equality-type constraints, (a) the second layer, (b) the first layer: 


™y 
x(n) +1A'(n) 
g=g(n) 


dF(g) 
dg 


a!(n+1)=a'(n)+K,,(n) Xg(n) 


Ay 
+K,yi(n) > 24j(n)-a 
0 


j= 


™y 


Al(n+1)=A'(n)+K yiq(n) x(n) SEs) x(n) +1A(n) 


dg g=3(n) 


Ay 


> 2a;(n)-a@ 


j=0 


+K y(n) 


dF(g) x(n) +A,(n)I 


dF(g;) 
ajaj (n) —2a ;(n)x(n) de J 


a j(n+1)=a j(n)+K, 


s=sir) “85 Ig -—gn) 


Ks: 
oF ajAj (n) 


N 
> aj (n)-@ +1A;(n) 
i=0 


dF(g) 
dg 


Aj(n+)) = Aj(n)+ Kya; (n) —2a;(n)x_(n) x(n) + A;(n)l 


g=g(n) 


35 
+Kj 54; (0) 


N 
> 3a; (n)— a 
i=0 


In the case of inequality-type constraints, (a) the second layer, (b) the first layer: 


™y 


xp(n) + QM(n)|+Kaq(n) q[a’(n)| 
g=g(n) 


dF(g) 
dg 


a/(n+1)=a'(n)+ Kyat (n) X,(n) 


My 


x(n) +QM(n)]+-K ga(n) q[a’(n)} 
g=g(n) 


dF(g) 


A(n+1)=max 0,A(n)-+K 9 ,1(n) Xg(n) = 


9.6 - Implementation of Primary Optimization Criteria for Neurons with Two Solutions 


173 


b 
: dF(g;) - 
aj(nt+))=a j(m)+Kq 4; (n) —2aj(n)eg(n) a x(n) +Q/;(n) 
8 g=s(n) 8j &j=8j(n) 
+K;,a;(n) 4|aj(0) 
0,4;(n) 


dF(g|  4F(g) 
dg | 


A j(n+1)=max + Ki ja; (n) —2a;(n)x,(n) x(n) +QA;(n) 


e-em) 98) gj=8;(n) 


+Ky,2; malaj(n)| 
Here Q and q are defined in the same way as in Chap. 8. 
These algorithms can be easily generalized for the case of an arbitrary number of 
layers and particular arbitrary forms of constraints. 
9.6 
Implementation of Primary Optimization Criteria for Neurons 
with Two Solutions 


Let us consider the minimum average risk function criterion. The expression for the 
transformed discrete error can be represented in the form 


Xe = (e— xp) FU(-2A+0(E-)+OB+OME+ D+ F(e+)le 


The gradient required for the closed-loop neural network design is 


0 12 fa) / 
Oe =2x/,(n)—& 
Oa; 0a; 


or in another form, 


12 


Xg 7 1 | it 1 i | 1 | 
8a; {e y)FIC2A + Oe) + 2B +O(E+DI4 vle+ ye} 


(9.6) 


signs |e al 2A+1)(€-1)+(2B+C)(é-4 o}} 


The values A, B, and C are determined by (7.13) and (7.14). In the case of (7.14), 


SE ang sign | hy )(€-1) + (Lr ba)(e+0)} (9.6a) 


1 


Chapter 9 - Neural Network Adjustment Algorithms 


In the case of (7.14a), one obtains for the transformed discrete error 


Hp =F(e+ NZnle 1I)+Z(E€4 1) t(é y)[Zy2(€ -1) + Z,(€ +1) (9.7) 


After corresponding transformations, we get the expression 


1, 
5 SBM {a (Zi Zi)xg+y(Z1 Zyy)221| 
i (9.8) 
+(e e)|(Z3, — Zip )xg + y(Zy2 —2Z)2Zm}} 


that coincides with the above obtained results at Z,, = Z,,=0. The estimation for the 
gradient of the second distribution moment can be derived using the following ex- 


' ne 
pression for x;: 


4x, =(1+ y)[(l+€)Z22 +(1—€)Zy1]+1— y) [A+ €)Z,2 +(1—-€) 2,1] 


In the above expressions 


Z = 
kpk = a|4kpk 
in order to provide the equality 


My 
he 
R=xg 


The minimum criterion for R under condition p,r, = p,r, is determined in the fol- 
lowing way. The gradient R* estimation (7.17) using adjustable coefficients is deter- 
mined by (9.6) with A, B and C from (7.18). The estimation of gradient R* along A is 
determined in the form of estimation of the first moment of distribution for the trans- 
formed discrete error according to (7.19) and (7.20): 


Oxy 1 1 
r i =7le y)[(—2.4, +, (e -1) + (2B, +, (+4 5 + y)Ce (9.8a) 


Expressions (9.6), (7.18), (7.20), and (9.8a) form the basis for the corresponding 
closed-loop neural network. 

The use of Z-transformation and equations (9.8) and (7.21) give the following esti- 
mation for gradient R* along A: 


Oxg 1 1 
OA =a + y)[Znle- + Zrle+))+7(€—y)[Zae€-D+ ZarlE +0] (9.9) 


where Zipk is determined by (7.22). 
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The minimum criterion for R under condition p,r, = const. is determined in the 
following way. The gradient R* estimation (7.24) using adjustable coefficients is deter- 
mined by (9.6) with A, B and C from (7.25). The estimation of gradient R* along A is 
determined in the form of the estimation of the first moment of distribution for the 
transformed discrete error according to (9.8a) with A,, B, and C, from (7.26). The use 
of Z,-transformation and equations (9.8), (7.27), (9.9), and (9.28) give the estimation 
for gradient R* along a; and A. 


9.7 
Implementation of Minimum Average Risk Function Criterion 
for Neurons with Continuum Solutions and K, Solutions 


According to (7.30), in the case of a neuron with a solution continuum (two pattern 
classes), one obtains 


xf, 501+ 2)Z12(%g)+ 50-2 (x4) =F 0 +e LO) +50-eWVL0) (9.9a) 
where 


N 


Y Xj 


i=0 


y=E—X, =F(g)=F 


The expression for the estimation of the average risk function gradient through the current 
neural network signals is obtained after some transformations in the following form: 


my 


Oxy’ _1dF(g) 
Oa; 2 dg 


x;|(1+é) (9.10) 


dl(y) dl,(y) 
dy +(1—é€) iy 


In this particular case, 
h(y)=Q+yY 3 b(xp)=G-y/ 
2 
1, (-1-x,)=(1-1-x,] sky 
2 
Iy(1—xg)=(I-1 +x) =xg 
, 1 1 _ 
me ai ae 


Consequently, 


= =2 g xjX 


0a; dg S 


which corresponds to the neuron with @,, minimization analyzed in Sect. 9.2. Expres- 
sion (9.10) gives the known expression for the estimation of gradient R in the case of 
two pattern classes and a neuron with two solutions in the form (9.6a). 
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In the case of continuum pattern classes, 


- =7,(x,\= il(e x, )e| 


my 


xe AU y,€) dF(g) 
1 


(9.11) 
0a; dy dg 


Expression (9.10) for two pattern classes is a particular case of (9.11). The function 


Ol(ys€) 
oy 


in (9.11) must be given a priori. 
In the case of a neuron with K, solutions (K pattern classes), the output signal has 


the form 
xP 
vat =i > sign(g—ais4,1) +1 
kp=l 
N 
S=> ax; 
i=0 


Similar to the previous case, 


12 
Oxg 


0a; 


ec oy 
= I > al 


where I(y,€) is a (Kx K)-matrix with the elements representing the first order differ- 
ence of the corresponding discrete function /(x,,€). This matrix has the following form 
in the particular case 


0 1 1 1 
al —-1 0 1... 1 
O22) a as ee ar | (9.13) 
oy 
—1 -1 -1... 0 
In the expression (9.12), 
K-1 K-1 
7 = dX 7 sign|g Atgkya|=> >, jim : 2 - 
a; 2h o=l i Aya? ORB (2 Atokp 1) 
Consequently, 
dy Ox, Ay, €) 


ign— =signx; d finall 
sign Ba: signx; and finally 5a, ay 


=sign x; 
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9.8 
Implementation of the Minimum Average Risk Function Criterion 
for Neural Networks with N* Output Channels (Neuron Layer) 


Let us consider closed-loop neural networks with N" output channels. The optimal mod- 
els of such neural networks and their secondary optimization functionals were analyzed 
in Chap. 6 and 7. The case of equal dimensionality of € and x, is assumed below. 

In calculations of discrete error transformations, the output signal has K, gradations 
for each channel. The measured vector of the discrete error has the form 


(pcx eed Oita = pws ed Ueda) = Gipantiye) 


This expression is multiplied by the scalar (7.33), and the norm of the resultant vector 
is calculated. Then 


xg =f +. athag = alk skiveskips-skip) 
if 

(Ey, «+05 Eye) = (Ky, 0+) Kye) 
and 


(Vy -++9 Yu) = (Kips +++» Knyrp) 
Consideration of the general case of Ky gradations of the neural network output 
signal in each channel with the form 
Kp-1 


1 
ue Ps 
kp=l 


: i 
sign gis - Fok 11 + | 
N 
git = So aixX;, — Ws _»N* 
i=0 
is not of principle. Let us therefore consider the case Ky = 2: 
Jie = SIGN Qin 


It can be shown that 


12 
Ox, 


Oa;;x 7 Ovjx 


a) 
ep €we Yo Ywe) 5 — vir (9.14) 
ajj* 
Here I(E,,...5 Eqs Yyo+++9 Ys) is (2N*x 2N")-matrix. The gradient is calculated as the cor- 
responding first order difference along y;,. discrete functions. This matrix can have the 
form (9.13) in some particular cases. The function dy;./da, is determined by its sign 
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Oy;x 


Ajj* 


sign = signx; 


Let the recognition system have a solution continuum in each of the N" channels. 
Let function F be the same for each output channel. Let the transformed discrete error 
have the first distribution moment equal to the average risk function R. This error is 
calculated as a sum of squared components of the vector of measured discrete error 
transformed according to (7.34): 


N* 
ee! | > [e¢— rua =I[P(x),é| 


=I 


In this case, 


Xitk = F(gjx)= 


and finally 


™y 


ax? a OF (gx 
8 = I(E.-- Eyes Xyks---9 Xk) (si 


Oa; jx OX jh Rix 


x;(n) 


This expression forms the basis for the design of the corresponding neural network 
with the closed-cycle adjustment. 


9.9 
Implementation of the Minimum Average Risk Function Criterion 
for Multilayer Neural Networks 


Three types of closed-cycle adjustment algorithms for multilayer neural networks are rep- 
resented below as the implementation of the minimum average risk function criterion. 

Expression (9.9a) is valid for the neural network of two pattern classes with one 
output channel (N’= 1) and arbitrary open-loop structure. The estimation of the av- 
erage risk function gradient has in general the following form: 


my My 
12 ! 
Ox e Oxy 
& 
Ah j+bhw J OA j+bhw 7 
Here 
Oxy (n) = _e)t—1_ shy) dF(g)___Ag(n) 
Oy 5.1 ,hw—j 5 IGA dy dg Oy 54 1,hw—j 


1 1 = dh(y) dF(g) Og(n) 
2Jh(y) dy dg Oy jy hy j 


é) 
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where 


Og(n) 


Oss 


is determined by the relationship (9.4). Finally 


nm My 
Oxg _1dF(g) ag at) 4 qo 4h) 
Oty 541 ,hy—j 2 dg OA _j hw —j d dy 


In the particular case of a multilayer neural network with full connections between 
layers, 


™n 
Ax!2 
g Horo A.) 
Oy 5. 1,hw—j dy dy 
™. 
Hw-j ; = , n 
dF(g) Hw, “W—j+3 j+l dF(gh”,) w-j jt2 
x— ee | ep OL See ita 
dg a Wap ee pW 
hy—=lhy—j+3=1 v=l Shw_y g=g(n) 7-0 
In the case of multilayer neural networks with neurons of two solutions, 
ax?" al aby) di,(y) ; 
5 g =e a +(1—€) i Signy 
Ahw — jw —j ¢ ” (9.15) 
j+2 
[1 sign aigy phy —y-s 
n= 


Consideration of multilayer neural networks with continuum pattern classes and 
solutions is not of principle. Let us therefore consider the case of K gradations for lev- 
els of signals ¢(n) and y(n). The open-loop neural network with N“= 1 is described by 
the following expression 


Kp 

1 

yr=l+— 
2 kp= 


: Ww 
sign( ghny ~ Ay kya) +3 


where gj‘, is determined by expression (2.7) in the case of the neural network with a 
continuum of solutions. The expression for the optimization functional gradient is 


Ox!2 
g___% 1,9) 
Wy js ymy—j OY 


oy 


Oty 54 shw—j 
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Matrix (0/dy)l(€,y ) is determined here in the same way as it was done in Sect. 9.7. 
Taking into account that 


oy 
Oty shy 4 


W-1 


=SIZN Xi 


the expression for the estimation of the average risk function gradient in the case of 
the neural network with neurons of two solutions will have the following form: 


Ox!? a jl Hw-o ti 
5 g = Ke,y)sign[] >> By pasty —y “iy 7 ,(n) (9.16) 
Thy —j+hw—j OY n= hy—=1 


This expression represents a basis for the design of the corresponding closed-loop 
neural network. 

Let us consider the neural network with N" output channels and two gradations of 
the output signal by amplitude in each channel. Here 


* 


: ‘ Ww 
Ye =signg =sign gp, + hy =1,....N 


Then, having a (2“*x 2")-matrix 


Oy;x I(Eq.--sE yes Ypo---9 YN*) 


one can obtain the adjustment algorithm similar to that of Sect. 9.7 for the last layer 
neuron and Sect. 9.8 for the rest of the neurons. 


9.10 
Development of Closed-Loop Neural Networks 
of Non-Stationary Patterns 


The main difference as compared with the case of stationary patterns emerges here in 
the design of the neural network adjustment algorithm. Let us consider a one-dimen- 
sional case of the neural network with minimization of a, in the closed cycle. 

In this case, 


2 My 2 ™y 2 ™y 2 ™y 
xg(nAT) =e (nAT) 4+x°(nAT) +a9(nAT) 


~2e(nAT)x(nAT)" +2a9(nAT)e(nAT)” —2a)(nAT)x(nAT) ” 


The averaging here must be performed across the set of realizations of the non- 
stationary random process at the time instant nAT. Usually one has only a single real- 
ization. The value 


x2(nAT) 
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is obtained by averaging across time in the memory interval m,, with an additional 
constraint of a possible convergence of the process to the stationary state and a priori 
information about modification of parameters for the distribution of the non-station- 
ary random signal. It must be assumed that the neural network parameters (adjustable 
coefficient a)) are constant in the averaging interval m,, in order to express the func- 
tional gradient estimation in the algebraic form. In this case, 


dx? (nAT) 


2 — 
=2x5(nAT) 
day 


The learning algorithm in the non-stationary case is determined by the following 
relationship: 


ag =aog 


[2 +m ar 
m 


n % ™y 
— AT |+K*x¢(nAT) 
m 


In order to design a closed-loop neural network, one needs the information about 
the character of changes of the signal x,(nAT) distribution. This information can be 
unambiguously obtained by the information about the character of changes in the input 
signal distribution parameters on the interval of the adjustment unit memory as well 
as information about the neural network structure. If one assumes that the pattern 
assemblages are normally distributed with a time-dependent mathematical expecta- 
tion and that the random and determinate components of the random signal x,(nAT) 
are statistically independent, then the hypothesis for the mathematical expectation 
changes is the same as in the case of the x(nAT) signal. The optimal filtering of the 
non-stationary signal and the aforementioned hypothesis for the first moment of dis- 
tribution allows one to use in the adjustment unit the same filter as in the case of es- 
timation of the secondary optimization functional gradient. The synthesis of such fil- 
ters is considered in [9-7]. Hypotheses about the character of changes of the first 
moments of distributions in the neural network memory interval are the same for both 
classes. If it is not so, then the hypothesis of the higher order for the synthesis of the 
estimation filter of 


x,(nAT) ” 


must be taken. 

The analysis of corresponding expressions in the case of non-stationary patterns 
shows that the estimation of the secondary optimization functional gradient is the 
problem of the random signal filtering. Characteristics of the non-stationary imple- 
mentation of the secondary optimization functional were determined above under the 
assumption of the existence of some a priori information about characteristics of non- 
stationary input patterns. However, this method is rather complicated in the case of 
multidimensional and multilayer neural networks and secondary optimization 
functionals related to the discrete error. To overcome this complexity, we introduce 
additional a priori information about the neural network input signal. This informa- 
tion allows one to reveal that the class of non-stationary characteristics of pattern 
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assemblages for which the a priori information about the character of time changes of 
gradient distribution parameters is sufficient. On the one hand, such an approach simpli- 
fies the filter synthesis procedure in the adjustment block. On the other hand, it allows one 
to design the closed-cycle adjustment algorithms with coefficient correction at each step n. 
In the previous cases, such a correction was performed after each m,, steps. 

The results of multidimensional filter synthesis given in [9-8] can be used for the neural 
networks with open-cycle adjustment. They can also be used for the design of the neural 
networks with closed-cycle adjustment at the estimation of secondary optimization 
functional gradients in the case of the non-stationary pattern neural networks. 


9.11 
Development of Closed-Cycle Adjustable Neural Networks 
with Cross and Backward Connections 


Let us consider below only the case of the second discrete error distribution moment 
in the capacity of the secondary optimization functional. 

The two-layer open-loop neural network with cross connections for a pattern rec- 
ognition system is described by the following expression (see Chap. 2): 


Ay 
L4jF 
j=l 


N 
A Fii%i 
1=0) 


y=F 


N 
+ GX 
1=0 


Similar to the above cases, here 


ax2 Qo 
& 
=—2 9.17 
Oa *g da” en 
In this case, 
dy _aF(g), | Oy _dF(g) GF(Sj) Dy _ dF(g) 
0a; dg “7° 0a; dg / dg; *” @a; dg ' 


These expressions form the basis for the design of the corresponding closed-loop 
neural network. 

The open-loop neural network in the form of neurons with backward connections 
is described by the following expression (Chap. 2): 


N 
S 5 a;x;(n) +azy(n—1) 
i=0 


y(n)=F (9.18) 


Let us consider the case of m, = 1.If m,,= const., then only the condition of y(n - 1) 
independent of a; must be satisfied. The Eq. (9.18) gives 


dy(n) _ AF(g) 
0a; dg 


Oy(n) _ dF(g) 
Oa, dg 


x(n) ; y(n—1) 
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Consequently, taking into account (9.17), the recurrent relationship for the design 
of the corresponding closed-loop neural network is 


™, 


a(n+1) 
a,(n+1) 


a(n) 
a;(n) 


x(n) 


y(n—1) 


dg 


(9.18a) 


X_(n) 


Let us consider the two-layer neural network with backward connections. The de- 
scription of the open-loop neural network is the following: 


Ay 
yn) =Flg(n)] 5 g(r) =) Jajy(n) +a, y(n—1) 


a (9.18b) 
yn)=Flgj(n)] 5 gj(n) =) ajyx{(n) + ay y(n—1) +a, y(n—1) 
i=l 
Using the transformation (9.17), one gets 
a dF a dF a) dF dF |g; 
y(n) _ (8) y(n) : y(n) _ ©) yin 1) y(n) _ Gg, ( Dts 
0a; dg Oag dg aj dg dg; 
ar(g (9.18) 
6) dF dF 0 dF j 
y(n) _ dF(g) a, (81) sn \) ; ON (g) aj! y (nt) 
Oayy dg dg; ayy dg dg; 


These expressions form the basis for the design of the corresponding closed-loop neural 
network. They can be easily generalized for the case of the neural network with cross and 
backward connections, of the neural network with arbitrary number of layers, and of the 
neural network with cross and backward connections of different “logical depths”. 


9.12 
Development of Closed-Loop Neural Networks in the Learning Modes 
with Arbitrary Teacher Qualification 


The learning algorithms similar in quality to the algorithms for the recovery of distri- 
bution density are given in [9-5]. These algorithms can be obtained from the given 
calculation at each step of the adjustment procedure for the multilayer neural network 
with fixed structure. The adjustment is performed by the vector coordinates of corre- 
sponding modes f(x). However, another approach similar to the one used above at the 
stage of the learning mode is also possible. The average risk in this case is the first 
moment of signal x; distribution (7.35). Consequently, 


Oy! 
Oa 


Op Ob Oy 
O[x—b(y)] Oy Oa 


0 
ar plx—b(y)]= (9.19) 


In particular, at p (x,b) = ||x-b||*: 


Op rT Ob(y) 
By =2[x—Db(y)] ay 
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The equation for the unknown functions b(y) represents some recurrent relation- 
ship (m,,= 1): 


b(y,n)=b(y,n D+K" px b(y,n—1)] (9.20) 


Equations (9.19) and (9.20) form the basis for the design of the neural networks 
adjustable by the closed cycle in the self-learning mode. Derivative dy/da in (9.19) is 
determined in the learning mode for the neural network with arbitrary structure. In 
the case of K, solutions, 


db(y) 
oy 


is (K,x N)-matrix obtained by the solution of (9.20) at threshold current step. 

A more detailed development of the closed-loop neural networks with K, solutions 
and N’ output channels in the self-learning mode is given in Chap. 12. 

The multilayer neural network adjustment algorithm in this case is the following: 


1. The initial value y(0) is calculated by the current input signal x(0) having some initial 
values of the neural network adjustable coefficients; 

2. The column of matrix b(y,0) corresponding to y(0) is selected; 

3. The neural network coefficients are adjusted according to (9.19) and so on starting 
again from point 1. 


The b(y) values at each adjustment step can be determined by parameters and struc- 
ture of the multilayer neural network. At the arbitrary teacher qualification, 


wall ye)b+(1 —»’) pix—b(y)] (9.20a) 
Then 

Ox', - 2, ( ) dp | 306) 
Oa Oa Oy O[x—b(y)] Oy 

ax” =. 

7a a [x—b(y)] (9.20c) 


ab) bG)” 


These two equations form the basis for the design of the closed-loop neural net- 
work with arbitrary structure and arbitrary teacher qualification. The adjustment al- 
gorithm is divided into two independent parts. One of them is determined by the term 
dy/da. It depends upon the open-loop neural network structure and determines the 
quality of the recognition solution. 

The developed methods for the multilayer neural network adjustment can also be 
used for the adjustment of the neural network with several layers and fixed coeffi- 
cients. 
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The multilayer neural network adjustment procedure (8.1) provides only the local extre- 
mum of the optimization functional. The initial values of the adjustable parameters must 
be given randomly in the interval determined by some physical arguments. In this case, 
the full adjustment algorithm of the multilayer neural network must consist of the 
7?-volume set of injection stages of random initial conditions, the following stages of ad- 
justment (8.1) and the stage of adjustment results averaging across 77° (see Chap. 8 and 12). 


9.13 
Expressions for the Estimations of the Second Order Derivatives 
of the Secondary Optimization Functional 


The expressions for the estimation of the second order derivatives of the second discrete 
error distribution moment (the secondary optimization functional) are given below for 
the multilayer neural networks of different types. In the case of a solution continuum, 


Qn My 

Oxs OF 
=~ 2X x; 

04a; Og 
i mn (9.21) 
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In the case of a multilayer neural network with sequential connections 
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Here 
jt2 se sake 
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The use of these expressions for the design of corresponding algorithms for multi- 
layer neural networks with a complex structure is complicated. However, the multi- 
layer neural network methods determine the decrease of requirement to estimate the 
second derivatives of the secondary optimization functional in the case of increasing 
the complexity of the open-loop neural network structure. For the two-layer system 
with cross connections, 
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In the case of neurons with the feedback loop, 


O° y(n) _ d?F(g) oxy. Oyn) Fg) 5 
da;da; dg” Lag a daz _ dg? [y(n 1)| (9.22) 


The obtained expressions form the basis for the design of the multilayer neural 
network adjustment algorithms with the use of the second derivative of the secondary 
optimization functional. 
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Chapter 10 


Adjustment of Continuum Neural Networks" 


This chapter deals with a continuum model of the two-layer neural network with con- 
tinuum neurons in the first layer and with a neuron with a continuum of features at 
the input in the second layer. The model structure in this case is described by the ex- 
pression 


L 
y(n) = sign f aailisn)sign Say (in) xy(n) + ao(i,n) di (10.1) 
I l=1 


Here x,,(1) is the /-th component of the feature vector; y is the neural network out- 
put signal; a,/(i,n) is the [-th component of the weighting vector-function of the first 
layer; and a,,(i,n) is the weighting vector-function of the second layer. 

The goal of this chapter is the derivation of the expression for recurrent adjustment 
procedures of the two-layer continuum neural network weighting coefficients and 
analysis of peculiarities of these procedures. 

The expression for the recurrent procedure for the neuron with a finite number of 
features is taken as the initial one: 


Gui=wier oe (10.1a) 
da a=a(n) 


Here Y(a) is the secondary optimization functional; a(n) is the system state vector 
(the current value of the extremum function argument); K'[L°x L®] is the matrix of 
coefficients; and L° is the vector a dimensionality: 


re Ky A A A K 9 
M M M 
a=|4; K= K, A Kj A K 0 
M M M 
ay Ko, A A A K 0,0 


“ This chapter is written in collaboration with Fomin Yu. I. 
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10.1 
Adjustment of a Neuron with a Feature Continuum 


The relationship (10.1a) can be written in the scalar form 


yo 
OY 
ai(n+1)=a(n)—) Ky (10.2) 
ae i aj=aj(n) 


The numbers i andj of the vector a components under L > ~ are replaced by 
parameters i and j that are continuously varied in some region J. The summation over j 
is replaced by an integral that has taken over parameter j. The expression for the recur- 
rent adjustment procedure in the case of the feature continuum is 


dj 
OY (a(j)) 


a(in-+1) =a(i,n)— = (J . 


(10.3) 
aj=a(j,n) 


Here K'(ij) is the function of two variables i and j. 


10.2 
Adjustment of the Continuum Neuron Layer 


Each neuron from the continuous set in this layer corresponds to some value of 
parameter i continuously varying in some region J’. One can consider some interval 
(c, d). 


The recurrent adjustment procedure of the continuum neuron layer has the form 


dY(a(i)) 


a(isn+1)=a(in)—K (i) ——— 
dali) |lati)=alin) 


(10.4) 


Here K'(i) - [L°x L°] is the matrix of functions of parameter i, and L° is the dimen- 
sionality of the feature vector (or vector-function a(i)). 


10.3 
Selection of the Parameter Matrix for the Learning Procedure of the 
Continuum Neuron Layer on the Basis of the Random Sample Data 


The structure of the neural network continuum layer used as independent system 
(a,(i) = 1) is described by the expression 


Lo 
y=sign [ sign Say (i)x + a9 (i) di 


I=1 


where L? is the dimensionality of the feature space. 
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The recurrent procedure of the neuron layer learning has the following form: 


dY(a()) 


a(i,n+1)=a(i,n)—K (i) ali) 


(10.5) 


a(i)=a(i,n) 


We have not yet considered the problem of matrix K" selection. It is evident that 
K"(i) = K'(i, n) due to its dependence on the adjustment step. Let us analyze the func- 
tion matrix dependence upon the step number n and parameter i under the condition 
that a,(i) = 1 for all n (for simplification). 

Let us consider the particular case of a diagonal matrix: 


K (i) 0 AO 


A A A 
0 A A K,,(i) 


The continuum of neurons realizes a continuum of hyperplanes in the feature space. 
Each value of parameter i corresponds to the neuron with discriminant function 


L 
g(i)=>_a)(i) y, + ap (i) (10.6) 
l=1 


In the points of the i-th hyperplane, g(y,7) = 0. The distance from some point y to the 
hyperplane is proportional to g(y,i). Let us term R(i) = g(y,i) as the distance between 
the point y and the hyperplane corresponding to the parameter i. Let us consider that 
if y belongs to the first class then the correct orientation for the i-th hyperplane is the 
orientation providing R(i) > 0. 

Let at the n-th step of the procedure R(i) = R(n,i), where 


L 
R(n,i) =) > a)(n, i) yj +49 (1,1) (10.7) 
l=1 


In the scalar form in this case, 


ay(n+1,i)=a;(n,/)—K7 2) (10.8) 
day (i) aq (i)=q] (i,n) 
Then 
L 
R(n+1,i)= > a(n+1,i)y; +a9(n +11) (10.9) 


l=1 


can be written in the form 
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E : 
R(n+1i)=)> amex age V1 
Il Gail) las iy—ay(mi) 
+|ay(n,i) Cap © 
BA) lan i)—ag(nsi 


Taking into account (10.9), one obtains 
R(n+1,i) = R(n,i) 


Ex .OY(a(i)) 
2 (n,i) 8a,(i) 


OY (a(i)) (10.10) 


y+Ko(ni) 
ay (i)=aq (n,i) 


ag (i)=ag (n,i) 


As the sum (integral) of the output signal of the neuron layer is used for the clas- 
sification of points in the feature space of the neural network, then the point y will be 
classified correctly in advance at the (n + 1)-th step if R(n + 1, i) is selected in the fol- 
lowing way: 


—&R(n,i) , if R(n,i)<0 


(10.11) 
R(n,i) , if R(n,i)>0 


Rin bd =| 


(it is assumed that y belongs to the first class). Here € , >0. In order to determine K/ (ni), 
1=0,...,L, from (10.10), it is necessary to take the (L + 1)-th point and to calculate 
R(n + 1,1) for each point according to (10.11). Then the system of L + 1 linear equations 
for the function K/(i), 1=0,...,L, is obtained. Let us rewrite (10.10) in the vector form 


R(n +1,i)— R(n,i) =—K(n,i) Ayy (10.12) 


Here K(n,i) = (Ko9(n,i)...K,,(n,i)) is the diagonal of matrix K‘(i), and matrix Ayy is 


dY(a(i)) dY(a(i)) 

dai) —  ~— Oa (i) 
_|. dY(aa) OY (a(i)) 
YX =| 11 da,(i) so VIL 8a,(i) 
dY(a(i)) dY(a(i)) 

YU Oa;() YLL Oa, (i) 


It follows from (10.12) that 
K (n,i)=—(R(n +1,i) —R(n,i)) Ayx (10.13) 


This is a final expression for the parameter matrix of functions at the n-th step of 
the procedure. 
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One must take m,, samples each consisting of L + 1 vector for averaging and then 
select the matrix 


aa 1 * . 
K Gps =D Km (i,n) (10.14) 


1 my 


The diagonal of matrix K’,, (bn) is calculated by (10.13) for each sample. 
The difficulty of this method is the choice of the optimal €) > 0. 


10.4 

Selection of the Parameter Matrix K’(i,j) for the Learning Procedure 
of the Neuron with a Feature Continuum on the Basis of the Random 
Sample Data 


The distance estimation method from the previous section can be also used for the 
parameter matrix in the case of neurons with a feature continuum. The recurrent learn- 
ing procedures for the neuron with a feature continuum have the form 


“ OY 
a(in+1)=alisn)— [ K,lsj> 
: ala j)=a( jn) (10.15) 
Y 
a(n +1)=ao(n)—K,, or 
Oap aS 
0 =a (n) 


The distance from the point x(n,i) to the hyperplane S(x(i)) = Ja(i)x(i)di +a,=0 is 


R,(x(n,i))= | a@_ 7...) x(n,i) di + ag(n) 
J Macau oo (10.16) 


Rigi = fan +1,i)x(n,i)di+ag(n +1) 


Taking into account (10.15), (10.16), one obtains: 


Rivilxni))= f alnix(n,i)di 


I 
ROY S OY 

~ JJ kn Se aa ee 

y I? \a(j)=alj.n) Olag(j)=ag(n) (10.17) 

=> Ry44(x(n,i)) — R, (x(n, 7)) 
Ai Peay * sana oe 
. Oa(j) ” Oa 
T Ja(j)=alj,n) 9 lag (j)=a9 (n) 


If x(n,i) belongs to the first class and R,,(x(n,i)) < 0, then one selects R,,,,(x(n,i)) = 
-€)R,,(x(1i)), Ey > 0. 
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Function K,(i,j) and coefficient K,, cannot be determined from Eq. (10.17) 
unambiguously. Let us put some constraints upon function K,,(i,j) introducing the 
function 


_ 1 1 OY 
K,G@p)= Za Dy ; (10.18) 
Ko ——— Oa(i) a(i)=a(i,n) 
8a) a¢j}—al jn) 
where K, = d-c. 
Equation (10.15) in this case is 
F : OY 
a(i,n +1) = a(i,n) -—~ (10.19) 
Oali)la(i) alin) 


because taking into account (10.18) 


_, ay 
JED 5G 


ie 
Bali) 


a(j)=a(j,n) a(i)=a(i,n) 


It follows from (10.19) that function (10.18) is a continuum analogue of the unit 
parameter matrix K’. Taking into account (10.18), one obtains for (10.17) 


: : y Per Y 
Rysi(x(nsi))— Ry (x(15i)) =— f . ; x(0n,idi—K, oO (10.20) 
a(j) a(i)=a(i,n) Ag ag =ag(n) 
Consequently, 
eas: R,(2x(7,3)) Ry a (xr,i))— [22 x(n,i)4i 
oY Oa(?)\q(i)=a(i,n) 
Oa ay =a (n) 
In the case of incorrect classification, 
il OY 
K,= (1+ €) R,(x(n,i)) — - x(n,i)di (10.21) 
oY Jali) \a(i)=ali,n) 
B09 | a0 j)=a (2) 


One of the possible ways to solve the problem of selection of the parameter coeffi- 
cients in the case of a neuron with a feature continuum therefore is the solution deter- 
mined by (10.18) and (10.21). Similar to the previous section, it is possible to use the 
averaging of K,, and K,,(i,j) across m, points of the function space. 

As in the previous section, the difficulty of this method is the choice of the optimal €. 
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10.5 
Characteristic Properties of the Two-Layer Continuum Neural Network 
Adjustment Algorithm 


These characteristic properties relate to the exchange of the structural matrix 
of coefficients K° by the matrix of functions K(i) at the transformation to the first 
layer continuum or by the function of two variables K’(i,j) at the transformation 
from the neuron with a finite number of features to the feature continuum in the 
second layer. The selection algorithm for K"(i) requires the solution of a system of 
L+1 equations (if K’(i) is a diagonal matrix of functions). In the case of a finite 
number of hyperplanes, one needs to find a diagonal matrix of coefficients for each 
hyperplane, i.e., to solve H systems of L + 1 equations (H is the number of neurons 
in the layer). 

The algorithm of K(i, j) selection consists in the solution of the integral equation for 
K(i,j) and coefficient K,,. Figures 10.1 and 10.2 show the open-loop structure of the 
continuum neural network and the block scheme of its learning algorithm. 


10.6 
Three Variants of Implementation of the Continuum Neuron Layer 
Weighting Functions and Corresponding Learning Procedures 


This section deals with the open-loop continuum neuron layer structure and its ad- 
justment depending on the method of weighting function implementation. 


Fig. 10.1. 

Open-loop structure of the 
two-layer continuum neural 
network 


Fig. 10.2. 
Block scheme of the two-layer y(n) e(n) 
continuum neural network at 


the (n + 1)-step 


Selection of Kp (/) | Selection of Kj (i,/) 


‘ : 


Ii(in) = -2K(i)x9(n) /(in) = -2)KA(iJ)x9(n) 
sign ao(i,n) sign x(n) sign X2(j,n) dj 
| | 
ay4(i,n+1) = ay(i,n) — (i,n) | ao(i,n+1) = aa(i,n) — b(i,n) 


Jains) faint) 
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Fig. 10.3. 
Output signal of the continu- 
um neuron layer 


The open-loop continuum neuron layer structure is described by the expression 


L 
x2(i)=sign ) *a)(i)x, + do(i) (10.22) 
I=1 


where x,(i) is the layer output signal; x, is the input signal vector; and a(i) is the vector 
of weighting functions for the continuum layer belonging to the piecewise differen- 
tiable function class with discontinuity of the first kind and a finite number of zeros 
(i=1...L). 

It follows from (10.22) that x,(7) is the function with the form shown in Fig. 10.3. It 
represents a sequence of rectangles with different durations and unit amplitude. The 
function under the signum sign 


ue 
f=) a,(i)x, +a9(i) 


I=1 


can have a complex form. However, the form of the function x,(i) is determined only 
by the number and location of zeros of the f(i) function. Its form between the neigh- 
boring zeros is not important. Hence, the weighting functions a(i) can be approximated 
rather roughly (Fig. 10.4) or more precisely (Fig. 10.5). 

The adjustment procedure is organized in the following way. The approximation 
interval is divided into S parts. The set of piecewise constant weighting functions 
(each one determined by S coefficients) or the piecewise linear weighting functions 
(each one determined by 25S coefficients) are constructed thereafter. Then the co- 
efficients that determine the weighting functions are adjusted by means of the gradi- 
ent method. 

The probability of the correct recognition is used as the neural network quality 
criterion. It is calculated with the use of the test vector sample. If the correct recogni- 
tion probability under the given number of steps of the learning procedure is less than 
the required value, then the number S is doubled and the adjustment procedure for the 
adjustable coefficients is repeated. 
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Fig. 10.4. ha(i 
Piecewise constant approxima- 
tion of the weighting function 
of the continuum neuron layer 


Fig. 10.5. 
Piecewise linear approxima- 
tion of the weighting function 
of the continuum neuron layer 


Fig. 10.6. 
Weighting function of the 
continuum neuron layer 


0 


As a result, the following important goal is achieved. The layer with the large num- 
ber of neurons is replaced by the continuum neuron layer, i.e., N weighting coefficients 
(for each feature) are replaced by the weighting function curve. This weighting func- 
tion is approximated by the piecewise linear or piecewise constant function required 
for sufficient recognition probability. The resultant weighting function is described by 
a sufficiently small number of parameters, at least less than the number of parameters 
in the case of a discrete neuron layer. The piecewise constant approximation of the 
weighting function curve is represented in Fig. 10.6. 

The maximally monotone function (with minimum number of sign changes of its 
derivative) provides the minimization of the number of the weighting function ap- 
proximation intervals using renumbering (ranging) of neurons in the layer. 
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10.7 
Learning Algorithm with 02, Secondary Optimization Functional 
(the Five-Feature Space) for the Two-Layer Continuum Neural Network 


The structure of such a neural network is described by the functional expression 


5 
y=sign) [ ap(i)sign) ap (i)xy + ao (i)di (10.23) 


I=1 


Let us consider the case of the functional o,,. The square of the discrete error mag- 
nitude is 


2 2 


e(n) ” —sign g(n)” (10.24) 


—<m 
yn)” 


where 


5 
g(n)= f aon) sign SS an (in)xy(n) + ayo (i,n)di 
l=1 


10.7.1 
Learning Algorithm for the Second Layer (Feature Continuum Neuron) 
Let us calculate the derivative: 


2 
3) 


—- 
Xg(in) " 


- a( sign g(n)” 
2k g(t)” A 

Oay(i,n) Oay(i,n) 
Let us designate 

5 
sea Stig ora] = x,(i,n) (10.25) 
i=] 

Then 
(sign g(n)) _ a(sign fa2(isn)xz(,n)ai) 

Oay(i,n) 0ay(i,n) , 


The value of the first derivative in this case cannot be determined, and it is neces- 
sary to use the information about its sign 


. d(sign g(n)) 
Oa,(i,n) 


: Bx (i,n) 
lim — 
TB-o014+ 8B g (n) 


=sign = sign x,(i,n) 
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because 


al f ay(i,n)x2(i,n) di] 
Oay(i,n) 


=x (i,n) 


(Here and below, the sign of averaging across m,, is omitted in all expressions except 
the final expressions). Consequently, 


a(x_(n)) 


Ba,tin) =—2x,(n)sign x) (i,n) (10.26) 


Taking into account (10.26) and (10.3), one obtains the expression for the recurrent 
adjustment procedure of the second layer neuron weighting function: 


alin +1)= alin) +2 [ Ki fxg (jm)signxz jm) "dj (10.27) 
where xin) is determined from (10.24), and x,(i,n) is determined from (10.25). 


10.7.2 
Learning Algorithm for the First Layer (Continuum Neuron Layer) 


Let us calculate the derivative: 


A(x_(isn)) 
Oay (in) 7 


sign g(i,n)) 
Oay (i, 11) 


2x g(n) a 


d(sign g(n)) = (sign g(i,n)) 7 O(x2(i,n)) 
Oay (i,n) Ox (i,n) Oay (i,n) 

d(sign g(n)) 
Oay (i,n) 


> sign = sign a,(i,n)sign x;(n) 


Consequently, 


2 
A(x¢(n)) os 
tae CE (10.28) 


Then one obtains the expression for the recurrent adjustment procedure of the first 
layer neuron weighting vector-function: 


a(i,n +1)=a(i,n)+2K" (i)x_(n)signay(i,n)signx(n) (10.29) 


where K’(i) is the [L°x L°]-matrix of functions of parameter i, and L° is the dimensio- 
nality of the feature vector x(n). 
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10.8 
Continuum Neuron Layer with Piecewise Constant Weighting Functions 


10.8.1 
Open-Loop Layer Structure 


The interval in which the weighting function is generated is divided into the equal 
intervals t. The partition is fixed. The amplitude of rectangles a, (s is the number of 
the partition interval) is adjustable in the learning process (Fig. 10.7). 

Let us introduce functions 


1, i>0 
h(i) = 
mI ( i<0 


then a(i) can be represented as 


S 
a(i)=) >a, (h(i—(s—1)t)—A(i—st)) 


s=l 


Let us introduce the designation 


H(i,s)=h(i—(s—1)t)—hGi—st) 


Then 


S 
a(i)=) \a,H(i,s) (10.30) 


s=l 


The neuron continuum is described by the expression 


L 
x =e Saf + dg (i) (10.31) 


I=1 


Fig. 10.7. 

Diagram of the open-loop 
structure for the neural net- 
work continuum layer with 
the piecewise constant weight- 
ing functions 


ea 
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In (10.31), 


S 
a(i)= > a5HG,s) 


s=l 
Then 
S L 
x(i)=sign|) > H(i,s)] )~aj,(i)xj +49, (10.32) 
s=1 l=1 
One obtains using the definition of H(i, s) functions 
S L 
x(i)= > H(i,s)sign| ) ~a),x) +49, (10.32a) 
s=] l=1 
It follows from (10.32) that this structure represents such a neuron layer 
S L 
x)= ) > H(i,s)sign So asx] +495 (10.33) 
$=] l=1 


and in this case the neuron outputs connect the input of the next neuron layer in turn 
during the equal time intervals i. Function H(i, s) is used as a commutation switch of 
the neuron layer outputs. The structure corresponding to (10.33) is shown in Fig. 10.7. 


10.8.2 
Recurrent Adjustment Procedure for the Piecewise Constant Weighting Functions 


The recurrent adjustment procedure of the continuum neuron layer in the case of a 
two-layer neural network has the following form: 


a(i,n+1)=a(i,n)+ 2K (i)xg(n)sign ay(i,n)sign x(n)” (10.34) 


Let us consider the particular case of the learning procedure with a diagonal matrix 
of functions K (i). 

Let the weighting function of the second layer neuron be piecewise constant. Then 
at each s-th interval, 


a(i,n) = a,(n) = const. (10.34a) 
since 
S 
a,(i)=) \asH(i,s) (10.35) 


s=l 


Ny 
K’(i)=)\KsH(i,s) (10.36) 


s=l 
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where K;, is a diagonal matrix at the s-th interval. Then the adjustment procedure for 
the s-th approximation interval will have the form 


a,(n+1)=a,(n) + 2K,(i)x(n)signa,,(n)sign x(n)” (10.37) 


10.8.3 
About Matrix K’(i) Estimation 


The expression for the diagonal components of matrix K’(i) was obtained above in the 
case of the two-layer continuum neural network: 


K’ (i)=—(Ry4(i)— Ry (i) Ayx (10.38) 


Here Ayy"'= Cy, sign a,(i), where Cy, is a numerical matrix (independent of i); a,(i) 
in this case is a piecewise constant weighting function of the second layer neuron, i.e., 
it is constant at the s-th interval. 

Taking into account (10.35), 


S 
R(n) = S°R,(n)H(i,s) 
= ; (10.39) 
R(n+1)=)°R,(n+)H(i,s) 


s=l 


Equations (10.36) and (10.39) give for the s-th interval 


K,(n)=—(R,(n+1)—R,,(n)) Ayx (10.40) 


10.9 
Continuum Neuron Layer with Piecewise Linear Weighting Functions 


10.9.1 
Open-Loop Structure of the Neuron Layer 


Let us consider the continuum neuron layer with weighting functions shown in 
Fig. 10.8: 


§ 
aj(i)= >> (4,11 +4591) His) (10.41) 


s=1 
The output signal has the form 


LS 


S 
x(i)= sign Yo (asuri +459) ) H(i,s)x; + Yas. + a,,0,1) H(i,5) 
l=1s=1 s=l 
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Fig. 10.8. ha(i) 
Weighting functions with 
piecewise linear approxima- 
tion for the continuum neural 
network 


Fig. 10.9. 

Diagram of the open-loop 
structure for the continuum 
neural network layer with 
piecewise linear weighting 
functions (s-th channel) 


s=1 
Consequently, 
S L 
x(i)= ) > H(Gi,s)sign > (acai + 45,0,1)*1 + (4,.1,0i + 4,0,0) (10.42) 
$=] l=1 


The structure (10.42) can be considered as the layer of S threshold elements with 
weighting coefficients linearly dependent on i. The neuron outputs successively con- 
nect the input of the next layer by functions H(i, s) at the change of i. The layer struc- 
ture is shown in Fig. 10.9. 


10.9.2 
Recurrent Adjustment Procedure for the Piecewise Linear Weighting Functions 


Similar to the previous section, matrix K’(i) has the following form under the condi- 
tion of the piecewise constant weighting function of the second layer of neurons: 


S 
K’(i)=)\(Kji+Kos)H(is) (10.43) 


s=l 
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Consequently, the layer learning procedure at the s-th interval has the following 
form: 


a,,(n+1)=a,,(n)+ 2K,,(1)xg (n)signa,,(n)sign x(n)” 
(10.44) 


ag,(n+1)=a9,(n)+ 2Kq,(n)x,(n) signa,,(n)sign x(n)” 


It is implied here that 


S 
a(i)=) > (a,,i+ao,)H(i,s) 


s=l 


The expression (10.44) is valid only for the aforementioned constraint upon the 
weighting function of the second layer neuron. In the general case, the learning pro- 
cedure has the following form: 


ay,(n-+1)i+ag,(n + 1)i=ay,(m)i + ag,(m)i + 2Kj,(n,i)x, (n)signx;(n)F,(i,n) ” (10.45) 


where function F,(i,n) depends on the output signals of the neuron layers following 
the current one. Consequently, if one were to designate the linear function as 


fi(tyi) = 2K; (n,i)x_ (n)sign x (n)E,(i,n) ” 


then one gets the following for the S-th interval: 


aj.(n+1)=a,,(n)+ <(f, (n,i)) 
i (10.46) 


ay.(n+1) = ag, (0) + flrni) +E (flrs) 


i 


The expressions (10.45) and (10.46) are the general adjustment algorithms for the 
piecewise linear functions of the continuum neuron layer. 

The form of matrix (10.43) is derived from the expression (10.38) under the con- 
straints (10.34a) upon the weighting function of the second layer neuron, definition of 
vector-functions R,,, ;(i), R,,(), and condition of linearity for the weighting function at 
the s-th interval. 

In truth, it follows from the definition of vector-functions R,,,, (i), R,,(i), and condi- 
tion of linearity for the weighting function at the s-th interval that 


R,(n, i) = R,,(n)it+ Ro, (1) 


: : (10.47) 
R,(n+1,i)= R,,(n+1)i+ Ro.(n +) 


Expression (10.43) follows from (10.38) and (10.47). 
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10.10 
Continuum Neural Network Layer with Piecewise Constant Weighting 
Functions (the Case of Fixed “Footsteps”) 


10.10.1 
Open-Loop Layer Structure 


Let the weighting coefficients have the form shown in Fig. 10.4 and divisional points 7, not 
be fixed. Let the amplitudes of rectangles be fixed at the value U,,Aa, where U,, is the integer 
number; Aq is the fixed value. The /-th weighting function of the layer has the following form: 


S 
a,(i)= ) \UAaH(i,s) (10.48) 


s=l 


where 
H(i,s)=h(i—t,_;)—h(i-t,) 


The output signal of the layer is 


LS S 
Aa) S“UgH(is)x, +} > Uo H(i,s) 


x(i)= sign 
I=1s=1 s=l 
Consequently, 
§ L 
x(i)= >> H(i,s)sign| Aa| 5 (Ux +U 0 (10.49) 
s=1 I=1 


The structure (10.49) is similar to the structure (10.32a) under the condition that 
time intervals t,, during which corresponding neurons connect the input of the next 
layer, variate, and the amplitudes of weights at the s-th interval are fixed. Thus one 
obtains an s-dimensional vector of variable parameters T,. 


10.10.2 
Recurrent Adjustment Procedure for Piecewise Constant Weighting Functions 
with Variable Interval Lengths f,. 
The recurrent adjustment procedure for vector T has the following form: 
x OY 
t(n+1)=(n)—K or) (10.50) 
Ot t=1(n) 
where Y(n) is the secondary optimization functional. If 


¥(n)=(e(n)— y(n)” = x2 


where y(7) is the output neural network signal and € (7) is the supervisor instruction, then 


y(n) = F[x(i)] 
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where x(i) is the output signal of the current layer; F is the transformation performed 
by the following neural network layers over x(i): 


OY Oy Ox(i) 
=—2x 
at, 8 ax(i) Ot, 


Ss 


Taking into account the form of x(i): 


OY _ . OF Ox(i) OH(i,s) 
Ot, 8 Ox(i) OH(i,s) Ot, 

OEY ogi Su x+U (10.51) 
snes) | ae 
It follows from the function H(i,s) definition: 

j j oh(t— 
OH(is) 0 en eee OH(i,s)__ Oh(t—%) 
OT; OT; OT; OT; 

OH(i,s)__A(sign(t—7,)) A(t—z,) 
Ot a(t—t,) Ot, 
O(t—7,) - 
OT; 7 
cE ees 
OT; 
It is necessary to use the information about the sign of the first derivative 
OH(i,s) 

OT, 
sig °C} , if i>f, 

OT; 

Finally, 
i x sign|Aa Su x+U. (10.52) 
at, g Ax(i) 8 = sl s0) . 
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Chapter 11 


Selection of Initial Conditions During Neural Network 
Adjustment - Typical Neural Network Input Signals 


11.1 
About Selection Methods for Initial Conditions 


The homogeneous neural network adjusted to the solution of some specific problem 
represents a dynamic system described by difference or differential (linear or nonlin- 
ear) equations. Therefore, the problem of selecting initial conditions for the system 
adjustment is an important part of the neural network theory. The quality of such 
selection significantly influences the quality of the solution. Usually this aspect is not 
considered. 

It seems that only Rosenblatt [11-1] mentioned this problem. However, he took 
the zero initial values of coefficients in all the experiments. This does not guarantee 
the achievement of extremum with a satisfactory value of optimization functional. 
This is especially the case in the problems with multi-extremum optimization 
functionals. 

One can consider two methods of selection of such initial conditions: selection of 
random initial conditions and selection of deterministic initial conditions. In the first 
method, the multi-extremum secondary optimization functional is used. The random 
elements are introduced into the procedure of the secondary optimization functional 
extremum search for the search of the local and global extrema of this functional. The 
local extremum search is necessary for the solution of the problem of the structure 
minimization for the multilayer neural network. The impression of a too large num- 
ber of the local extrema in the space of the adjustable coefficient emerges at the first 
stage of the use of the random initial conditions. However, the enlargement of the 
open-loop neural network structure results in the increase of the numerosity of the 
multilayer neural network states. This numerosity is estimated by the secondary 
optimization functional value. This means that the majority of the local extrema 
provide one and the same quality of recognition. This remark must be taken into 
account in the methods described below of the multilayer neural network quality 
estimation by estimation of the secondary optimization functional value. The experi- 
mental results obtained in this chapter and the aforementioned comment show the 
validity of the approach to the problem of adjustment with the use of the random 
initial conditions in spite of the fact that such an approach introduces redundancy in 
the time of the neural network adjustment. 

The goal of the second method with deterministic initial conditions is the a priori 
introduction of the neural network into the region of one of the local extrema of the 
secondary optimization functional. The multilayer neural network must be maximally 
amorphous at the level of the first, second, etc., layers, i.e., it must be ready to solve the 
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Fig. 11.1. 

Divisional surface at the selec- 
tion of initial conditions: 

1 - the first class; 2 - the sec- 
ond class 


most complex (from the point of modality of f(x)) recognition problem. The prelimi- 
nary variant of the possible structure of the divisional surface in this case at the prob- 
lem of learning for recognition of two pattern classes is represented in Fig. 11.1. The 
final variant can be determined only after introducing the criterion of amorphism or 
dispersion. 

It is evident that the multilayer neural network with the lowest amorphism and 
dispersion is the neural network with equal coefficients of the first layer neurons and 
with corresponding divisional surfaces shifted to the “margin” of the feature space. The 
physically implemented feature space region is shown in Fig. 11.1 by the dotted line. 
This also takes place in the self-learning mode if there is no preliminary information 
about the cells in Fig. 11.1 belonging to this or that class. The initial condition for the 
adjustable coefficients of the second, etc., layers is calculated by the geometry of the 
divisional surface realized by the neurons of the first layer with instruction concerning 
belonging of the initial feature space regions to this or that class. 

The initial conditions are selected according to the a priori information about dis- 
tributions at the already-known structure of the open-loop neural network and se- 
lected optimization functional. The selection of initial conditions depends not only on 
the selection of vector a(0) but also on the method of calculation of parameter matrix K 
elements at each procedure step. Methods for the selection of initial conditions can be 
ordered by the form of the used information: 


* Random initial conditions without use of the learning sample; 
= Deterministic initial conditions without use of the learning sample; 
= initial conditions with use of the learning sample. 


11.2 
Algorithm of Deterministic Selection of the Initial Conditions 
in the Adjustment Algorithms for Multilayer Neural Networks 


This algorithm uses a priori information about mode configuration in the feature space, 
their number and variance. 
Let the feature space be normalized into the unit hypercube 


K={x:0<x, <1, i=L..,N}, xjEK , (j=b...M) (11.1) 
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where x; is the feature space of j-th pattern of the learning sample; N is the space dimen- 
sionality; and M is the learning sample length. Let us consider the case of two classes 
with K, and K, modes. Designations of corresponding classes are 


r),(i=1,...,K,), 15)(i=1,...,K) 
Let us organize monotone sequences for the mode projections on each coordinate axis: 


j j 12 j- 
St St oe! > (s,2=125 j=heaN); (11.2) 


where i, , are the numbers of modes of the first and second classes. 
Let us consider differences of the following form: 


Psd 


Wty) > f=beaN (11.3) 


Let 0, be the estimation of the variance of j-th mode and let the following condi- 
tion be valid: 


ee cael 


li 212 ly 


<(oj +o}, ) ) Fal aglt (11.4) 
Then hyperplane is drawn through the middle of the segment 


be 7, | (11.5) 
square with j-th coordinate axis. If the drawn hyperplane separates also projections of 
other modes, then it is drawn through the point obtained by the averaging of middles of 
corresponding segments (11.5) whose edges are separated by this hyperplane. If the con- 
dition (11.4) is not fulfilled for a given pair of modes, then the hyperplane is not drawn. 

Let us consider an example shown in Fig. 11.2 for illustration. The mode configura- 
tion has the following form: 


027,27 2755S 
0<74 << <4 <1 


One checks the validity of conditions (11.4) for x,. As a result, one draws hyperplane 
1 through the middle of the segment R}, R}, where 


1 1 1 1 

Ri h2—"1 Rig 211 

Lo 8 {= 
2 2 


Similarly checking for x,, one draws the hyperplane 2 through the middle of the 
segment R?, R3, where 


2 2 2 2 
7 —-N h2—1- 
R= 227 ‘11 ; R= 12 22. 
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Fig. 11.2. 
Exampleofthe use of algo Pees eee eee SA ee eee i 
rithm for deterministic selec- 
tion of the initial conditions 
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Table 11.1. Parameter description 
R1 atrix for estimation of coordinates of first class mode centers 
R2 atrix for estimation of coordinates of second class mode centers (projections of mode 


centers on éth coordinate axis are located in the ¢th column) 


SGM1 atrix for estimation of first class mode variances 

SGM2 atrix for estimation of second class mode variances 

M1 umber of first class modes 

M2 umber of second class modes 

N Dimensionality of feature space 

R3 atrix obtained by combination of matrices R1 and R2 by columns (M3 = M1 + M2) 
xX atrix of points on coordinate axes through which hyperplanes are drawn 

K umber of points on current coordinate axis 


| umber of coordinate axes 


A atrix of drawn hyperplane coefficients 


IK umber of hyperplanes 


Then the first and the third sections can be considered as belonging to the first 
class, and the second and the fourth sections can be considered as belonging to the 
second class. The error is determined by the hatched regions. 

The block scheme of the program that realizes this algorithm and parameter de- 
scription is represented in Fig. 11.3 and in Table 11.1. 
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R1(M1, N) SGM1(M1, N) 
R2(M2, N) SGM2(M2, N) 


Arrangement of elements of the 
1st column fo matrix R3(M3, N) 


v 


true 1=/+1 


false 


false 
true 


| RR = ABS(R1(J,1) — R2(JJ,!) 


K=K+1 


Vv 
| MX =AMAX (R1(J,1), R2(,N) 


Vv 
| X(K,l) = MX — RR/2 


= +4 


Vv 
SUM = SUM + X(R,I) 


IK=IK+1 
A(IK,l) = SUM/I1 


PRINT  |[€ 


STOP 


Fig. 11.3. Block scheme of the program realizing the algorithm of deterministic selection of the initial 
conditions 
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11.3 
Selection of Initial Conditions in Multilayer Neural Networks 


The problem of the initial condition selection in multilayer neural networks is divided 
into a step-by-step selection of the initial conditions for the first, second, etc., layers. The 
first layer was considered above. For example, let the coefficients of the first layer be de- 
termined as a result of deterministic selection, and thus the set of sections is obtained. 
Each section corresponds to the number consisting of + 1 and -1. If this set provides the 
location of the divisional surface close to the optimal one (the recognition error is small), 
then the probability that this section configuration will be changed is small. The estima- 
tions of such a probability are given below. The learning samples for the second, etc., lay- 
ers will be preserved with this probability. The obtained coefficients can remain constant 
if they correctly perform decomposition of the output space of the previous layer. 

Figure 11.4 shows the example of the second and third neuron layer coefficient values 
dependent upon the classes’ number distribution across sections when the configura- 
tion is not changed. In the cases a-d, the logic function is realized on the neuron of the 
second layer. In the case e, the three-layer neural network is realized. 

Let us consider the problem of coefficient fixation in more detail. Let some piece- 
wise linear surface be realized in the feature space. Each section must correspond to 
some class number in order to form the learning sample. 


Fig. 11.4. Example of the second and third neuron layer coefficients value dependent upon the classes’ 
number distribution across sections 
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It can be done in the following way. First, one obtains the correspondence between 
each element of the learning sample and the section number. Then one determines the 
class number corresponding to the maximum number of its patterns in each section. 
And this class number will correspond to this section. Then it is necessary to select 
optimal coefficients for the second layer. Since the piecewise linear surface in the 
majority of diagnostic problems is not very complicated, then one can obtain experi- 
mental tables of correspondence for the second layer of neurons to the given configu- 
ration. Though this task is rather complex and time-consuming, its solution provides 
the reduction of the initial condition selection to the deterministic selection of the 
initial conditions for the first layer and fixation of the coefficients of the following 
layers using such tables. The estimation for the probability of preservation of configu- 
ration in the feature space is given below. 

Let us consider the deterministic selection of the initial conditions for the multi- 
layer neural network with three neurons in the first layer shown in Fig. 11.5. Table 11.2 
represents the values of the logic function Y with the first layer outputs as its argu- 
ments. The values of this function on sections are represented for the given configu- 
ration. Here y; (i = 1,2,3) are the outputs of the first layer neurons. Such a function can 
be realized by two neurons of the second layer with coefficients: 


af =f, 1, 1-2} 


az={-1, -1, —1, 2} me) 


Sections in Table 11.2 can be modified in the process of further learning. Their 
number is 2“1— m, where H, is the number of neurons in the first layer and m is the 
number of sections selected as initial conditions. In our example, two sections, 
[+1,+1,-1] and [-1,+ 1,-1], can arrear. One obtains under the fixed coefficients of 


Fig. 11.5. 
Example of the deterministic 
selection of the initial condi- @) 

tions for the multilayer neural tpn 1 
network with three neurons in 1@ © 2 
the first layer 


X2 


x 


Table 11.2. 

Values of the logic function Y 
with the first layer outputs as y, =| | ae +1 =| = 
its arguments 
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the first layer that the new sections belong to the first class. It is seen from Fig. 11.4 that 
the first section increases the recognition error because the element of the second class 
will appear there, and the second section does not increase the error. The classification 
error can increase (as a result of fixation the coefficients of all the layers are higher 
than the first one) in the following situations: first, if the new sections are considered 
as belonging to the other pattern class, and second, if the old section that will change 
its configuration in the learning process gets more other class patterns than before. 

Let us consider the section e! = {ei,..., eff, ej= 1,..., m. Let the first class correspond 
to the number of the section €’ and let this section get S; elements of the second class 
in the process of the initial condition selection. Then the probability of the wrong clas- 
sification obtained due to the surface of this section is S/M;, where M, is the number 
of patterns in the section ¢'. Further closed-cycle learning must involve those hyper- 
planes whose change will decrease the error probability S;/M;. The probability of the 
section deformation in the learning procedure can be defined as 


Pi = P(S;/Mj) (11.7) 
Let us assume that dependence of the section deformation upon the probability S,/ 
MM, is linear 


Si 
ye 11.8 
Pi M,; (11.8) 


and that patterns are uniformly distributed along the section with the density S. Then, 


SpVi 
M: 


1 


(11.9) 


Pi ~ 


Evidently, p;— 0 if V;— 0, Sy= const, and p;— 0 if s 0, V;= const. This means 
that the probability of the section deformation that does not contribute to the recog- 
nition error is zero, and that under the constant distribution density Sy the decrease of 
the section volume results in the decrease of the number of elements S; that it gets thus 
decreasing the probability (11.9). In the case of V;— Vp, one gets p;— % (under the 
assumption that the sample includes for M/2 elements of each of two classes). The 
averaging of (11.9) for all the m sections gives 


1AS 
Pm =O, 


and taking into account that 


ee 
2 


for each p,, one gets the estimation for the average section deformation probability 
under the considered configuration: 
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1 
Pm SS 


As a result, it is seen that it is possible to estimate the probability of the already 
existent section deformation in the learning process at the stage of initial condition 
selection for the first layer neurons, i.e., to estimate the validity of the use of fixed 
coefficients of the following layers. 

Taking into account the above considerations, one can present the initial condition 
selection method for the multilayer neural networks. 


1. The piecewise linear surface is drawn in the feature space with the help of the de- 
terministic algorithm of the initial condition selection for the first layer neurons; 

2. The correspondence between the i-th pattern of the learning sample and the sec- 
tion number (¢/=1,..., m) is found with the help of the examination of the first 
layer across all of the learning sample; 

3, Teacher instruction E; (i= 1,..., K) is assigned to the section number é/, where K is 
the number of patterns. The values Y;, (i= 1,..., p) are calculated for this purpose 
for each section (Y; is the number of patterns of the i-th class that occurred in the 
j-th section; p is the number of classes whose patterns occurred in the j-th section) 
and the maximum value max Y;;is found. The corresponding E; is the required teacher 
instruction; 

4. The probability of considering the j-th section as belonging to the i-th pattern class 
is calculated: 


P; =1-<1 
J 
where 
P 
Sj=Mj—max¥y , Mj=)_Yi 


i=1 


5. The logic function realization on one neuron of the second layer is checked. If this 
function is realized then the initial condition selection is terminated at the stage of 
the second layer neuron learning; 

6. If this does not take place, then the initial condition selection is performed on the 
neurons of the second layer either in a way similar to p. 1 or with the help of tables 
of correspondence of the second layer coefficients to the specified section configu- 
rations; 

7. A similar procedure is used for the following layers; 

8. The sub-samples for the first layer learning are formed: one checks successively for 
each section ¢/ which hyperplanes that form this section contribute to the error S,, 
then it shifts each hyperplane by +r; (7; > te )s where 7 is the mean distance between 
the nearest patterns that occurred in the given section, and it checks the change 
of S;. The hyperplanes selected in such a way must be exposed to further closed- 
cycle learning. 
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The partial learning of the first-layer neurons provides the following: first, not to 
train hyperplanes with optimal location, second, to get the reduced learning time. 

The aforementioned method reduces the multilayer neural network learning to the 
learning of the first neuron layer. 


11.4 
Initial Condition Formation for Neural Network Coefficient Setting 
in Different Problems of Optimization 


Generally, multilayer neural networks in solving mathematical problems form the multi- 
extremum optimization functional in which the connection is the best, and often the 
only decision is reaching during the setting process using a system of a global extre- 
mum of the given functional. In the overwhelming majority of works, the initial con- 
dition selection is suggested to be made through the following methods: 


= To choose ground initial conditions; 
" To choose initial conditions of setting casually; 


In this work, it is suggested that the initial conditions of setting the multilayer neu- 
ral network weight coefficients is chosen for solving mathematical problems in a spe- 
cific way of solving every problem. In the first turn it creates the following problems: 


= Systems of algebraic equalities; 

= Systems of algebraic inequalities; 

" Approximation and extrapolation of functions; 

= Pattern recognition, as a particular problem of function approximation; 
= Clusterization or self-teaching; 

= Optimization; 

= Dynamic object modeling. 


In the succeeding period, a quantity of current problems in the neural network logical 
basis will be increased [11-5, 11-6] and correspondingly for other problems the methods 
of initial condition formation will be developed for a multilayer neural network setting. 

The problem of initial condition choice (formation) in its turn is separated into two 
parts: 


= Formation of the main idea (algorithm) of the choice of initial conditions; 
= Weight coefficients of the neural network calculation of a chosen structure as initial 
for setting in an adaptive regime. 


Some problems of the initial condition choice in a given work are defined only 
without a presentation of a final decision; in our opinion this is what is also important 
for a correct orientation of the researchers in this field. 

Here the main goal is defining the problem of initial condition formation for the 
coefficient setting in the multilayer neural networks by using a specific method for 
every concrete problem. 
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11.4.1 
Linear Equality Systems 


The main idea of the choice of initial conditions for two variants of investigated neural 
network algorithms to solve the algebraic equalities is presented in Figs. 11.6a and 11.6b. 

In the first version of the neural network algorithm of solving the algebraic equali- 
ties systems in the case of a one-layer neural network the initial conditions of X vector 
is offered to calculate, solving N equalities of one variable, forming their coefficients by 
separation (main) matrix A diagonal, or N equalities of two variables, forming their 
coefficients by separation of two matrix A diagonals (main and next to it from above 
or below). One can consider the version of three matrix A diagonals separately (main 
and two next). The choice of the diagonal number in the matrix is determined by 
admissible time for solving the diagonal system equalities using ordinary methods for 
X(0) calculating the following solution of linear equality systems by a neural network 
algorithm. As mentioned above numerous times, this procedure is effective for big 
dimension systems. In this case, the problem is the initial condition choice of weight 
coefficients in the second, third etc. layers of the multilayer neural network, which solves 
the system of algebraic equalities in the structure presented on Fig. 11.6a. 

During the work, [11-6] a modified approach to neural network algorithms of the 
solution of algebraic equality systems is offered, which is presented in Fig. 11.6b. Here 
the neural network input is A matrix and b vector, and output - required X solution. In 
conclusion, the main problem is, in forming the output signal of neural networks that 
is found by the first method of algebraic equality system determination, the calculatation 
of the neural network coefficients, which could content to ratio X(0) = F{W(0)}, where 
F is the conversion made by the neural network, and W(0) is the number of initial 
values of weight coefficients of the neural network by a defined structure. 


11.4.2 
Linear Inequality Systems 


By solving linear inequality systems, the initial condition choice of setting multilayer 
neural network coefficients is possible to make. This is also possible with a linear 
equality system solution with proper signs of inequality substitution in the inequality 
systems to the signs of equalities. 


Nee 


X(0) x Y A,b 
NN 
Setting Setting 
algorithm . algorithm 
a 


Fig. 11.6. Initial conditions choice for two variants of investigated neural network algorithms 
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11.4.3 
Approximation and Extrapolation of Functions. 


It is widely believed that neural networks are mostly effective approximators and extrapo- 
lators of functions. It is directly connected with the natural development of the theory of 
filtration and extrapolation signals from linear theory, where the amount of a priori in- 
formation about useful signals and noises is quite considerable in nonlinear theory. In 
this connection a neural network machine using the filtration and the signal extrapolation 
problem includes and summarizes quite a great quantity of attempts to solve the non- 
linear processing problem by means of other methods. Neural networks solve the filtration 
and signal extrapolation problems for difficult and unknown characteristics of a useful signal, 
often variables in time, and variables in time-noise characteristics. In the framework of 
neural networks are the initial attempts to build multivariable filters and extrapolators. 

From our point of view, the natural choice for initial conditions of a multilayer neural 
network setting of equivalent linear filter in adaptive approximators and function 
extrapolators is the adaptive filter construction. Linear discrete filter Zadeh/Ragozini 
is a z-filter of an order which is equal to the memory of a filtration or extrapolation 
system with coefficients that calculate as functions of N memory and extrapolation 
time d. Because the linear filter is produced based on a priori information about known 
functional form signal and additive noise to a useful signal, particularly in the form of 
white noise, a few sets of weight coefficient filters can exist for useful signals of differ- 
ent complexity and correspondingly calculate with different computational complex- 
ity. The main problem is to calculate the z-filter coefficients to the appropriate initial 
coefficients of a neural network of a fixed structure. 


11.4.4 
Pattern Recognition 


At the outset is the problem of choosing initial conditions for an adaptive neural network 
setting; solving the problem of pattern recognition appeared in the work 11.2]. In this 
work two methods of choice of mentioned initial conditions are considered: the choice of 
accidental initial conditions and the choice of determinate initial conditions. The choice 
of Accidental initial conditions is made because of the secondary optimization functional 
multi-extreme coupled with the multi-modality of distribution f(x) input signal and limi- 
tation of the open-ended neural network structure. Accidental elements are introduced 
into the procedure of the extremum search of the secondary optimization functional 
because of the necessity of local and global extremum of the mentioned functional search. 
The necessity of the local extremum search is determined by the necessity to solve the 
problem of the multilayer neural network structure minimization by analysis of setting 
results. At the first phase of accidental initial condition use (and the following phase of 
setting equalization results for a multitude of accidental overshoot phases of initial con- 
ditions) an impression is made about a great number of local extrema of the secondary 
optimization functional in space of adjustable coefficients. However because it is neces- 
sary to mark that with meshing of an open-ended neural network structure, the multi- 
plicity of multilayer neural network conditions increases, which is estimated by the value 
of the secondary optimization functional. In other words, the majority of local extremum 
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functionals in the space of adjustable coefficients ensure the same recognition quality. 
This remark is necessary to connect with the methods described below of a multilayer 
neural network quality estimate by the secondary optimization functional value estimated 
by the current signals in the neural network. Taking into account everything that has been 
mentioned above, one can mark the accuracy of the approach to the setting using the 
accidental initial conditions, although this approach obviously introduces the redundancy 
in the neural network setting time with the intention of input signal full learning (par- 
ticularly, solving the global extremum functional). 

The purpose of determinate initial condition leading is a priori neural network leading 
into the area of one of the local extrema of the secondary optimization functional in the 
space of adjustable coefficients. On the geometry level, the first, second etc. multilayer 
neural network should be maximumly amorphed, distributed, or namely, advanced to 
solve the most difficult (from the view of f(x)) modality of the recognition problem. 
Thinkable configuration of a shared surface in this case with teaching of recognition of 
two class patterns is presented in Fig. 11.7, although it is the preliminary version. The final 
version can be determined only by leading of an amorphous criterion, distribution. Ob- 
viously that which is smallest amorphed and distributed is a multilayer neural network 
where all the coefficients of the first layer of neurons are similar, and appropriately dis- 
criminate surfaces are removed to the edge of the feature space. This is physically realized 
in the neural network area of the feature space in Fig. 11.7 and is indicated with a dotted 
line. This also spreads to the self-teaching regime, if the cell belonging in the Fig. 11.7 to 
one or another class was not pointed out before. Initial conditions for adjustable coeffi- 
cients of the second layer etc. are estimated by the geometry of the dividing surface, which 
is realized by neurons of the first layer with specification of the initial feature space areas 
belonging to one or another class. One of the methods of choosing determinate initial 
conditions was offered in work [11-2]. 

In conclusion of this point it is necessary to mark those methods which are used in 
the initial condition choice that one can sort by the type of information utilization: 


= Accidental initial condition choice without the use of teaching extracts; 
= Determinate initial condition choice without the use of teaching extracts. 


Fig. 11.7. x 
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The means of choosing initial conditions with the use of extracts is connected with 
the possibility of using the initial coefficient neural network utilization of the neural 
network with fixed structure of results of teaching neural network with a variable struc- 
ture [11-2, 11-4]. 


11.4.5 
Clusterization 


The first problem is choosing initial conditions for an adaptive neural network setting 
which resolves the problem of clusterization that was applied in work [11-2]. The natural 
version of the initial condition choice in the adaptation process in the multilayer neu- 
ral networks, solving the problem of clusterization, is the construction of the initial 
dividing surface in the multidimensional feature space as it was shown in Fig. 11.7 
without indicating which areas belong to one or another class. Weight coefficients of 
the first and following neurons are formed based on the geometry of the assumed 
dividing surface with equal dissection of the multidimensional feature space to the 
areas which are appropriate to the estimated clusters. 


11.4.6 
Traveling Salesman Problem 


The traveling salesman problem is the particular problem of linear programming at 
that time when the linear programming problem is the particular problem of optimi- 
zation together with quadratic and non-linear programming. The traveling salesman 
problem is solved by finding the shortest route between N towns when leaving the 
appointed town and returning back there. 

A possible version of the choice of initial conditions, of the initial route while solv- 
ing the traveling salesman problem is the choice of route, which is logically choosing 
from the appointed town the nearest one to it, then the nearest to the chosen one, etc. 


11.4.7 
Dynamic System Modelling 


Dynamic system identification using the neural networks is mostly effective in the case 
of substantially nonlinear systems, systems with variable parameters and structure, 
and the multivariate and classified systems. In the easiest case of linear systems, the 
identification is realized by feeding the system of jump signals of every amplitude to 
the input and solving z-transformation of the transition process. In that case, the model 
of the system is a z-filter with coefficients of a z-transition function. In the case of 
more complicated systems, the initial information specified above of the object one 
can get, gives to the input the sequence of jump signals of different amplitudes in the 
range of input signal changes from zero to Xp,__,,, (in the Fig. 11.8). At the same time, 
in consequence of the material nonlinear object, the reactions to the different jump sig- 
nals wont be linear-contiguous. At the same time, one can use z-transformation of system 
reactions to the jump input signals of different amplitudes for forming the initial values 
of coefficients of the first layer of a multilayer neural network, which identifies the ex- 
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Fig. 11.8. A 
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plored object. The subject matter is still the question of forming the initial values of co- 
efficients of the consequent layer of the multilayer neural network with full sequential 
connections, and also the feed-back connection coefficients in the case of utilization of 
neural network identification for networks with adjustable feedback connections. 


11.4.8 
Conclusion 


The choice of initial conditions for a multilayer neural network setting is an important 
means of both speeding up the convergence of adaptive algorithms and ensuring the 
convergence to the global extremum optimization functional. Unfortunately, practi- 
cally no attention is paid in this matter of the classic computational mathematics to 
the development of iterative algorithms of complex problem solving. Every current 
problem in the neural network logical basis demands its method of forming initial 
conditions for the adaptive algorithm of the multilayer neural network setting. It will 
be the matter of developing this research in the future by solving a class of problems, 
which are solved on the neural network logical basis. 


11.5 
Typical Input Signal of Multilayer Neural Networks 


The selection of some class of typical signals is performed for the objective compari- 
son of the multilayer neural network quality in the adjustment mode and in the sta- 
tionary state. This problem is solved in a relatively complete form in the case of the 
linear systems of automatic control with deterministic and random input signals. The 
relatively complete class of deterministic input signals is the class of polynomial input 
signals usually used for the estimation of the control system’s quality. The main char- 
acteristics of the signal complexity here is the corresponding polynomial exponential 
order or the distribution f,(x) modality. It is reasonable to consider that in the case of 
self-learning, the distribution of the typical stationary input signal of the multilayer 
neural network is multimodal with a relatively homogeneous location of distribution 
f(x) modes in the physically realized pattern space. 

Figure 11.9 shows the complete class of the typical multilayer neural network input 
signals in the self-learning mode illustrated by the isolines of f,(x) in the physically 
realized pattern space (the two-dimensional representation in the X space is condi- 
tional). Here ris the complexity of the typical input signal of the multilayer neural 
network. The variance of each f(x) mode must be chosen in such a way that the modes 
are sufficiently pronounced. Figure 11.10 shows isolines f,(x) and f,(x) for the typical 
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Fig. 11.9. Conditional representation of the typical neural network input signals in the self-learning 
mode ordered by the degree of complexity 
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Fig. 11.10. Conditional representation of two classes of the neural network input signals in the learning 
mode ordered by the degree of complexity 


input signals in the case of multilayer neural network learning of the recognition of 
two pattern classes (f, - empty circles, f, - shaded circles). 

It must be mentioned that each specific problem solved by the neural network re- 
quires its own method of the choice of typical input signals. 
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Chapter 12 


Analysis of Closed-Loop Multilayer Neural Networks 


12.1 
Problem Statement for the Synthesis of the Multilayer Neural Networks 
Adjusted in the Closed Cycle 


This chapter represents the final stage for the synthesis of multilayer neural networks 
with fixed structure that are adjusted in the closed cycle. It is implied that the open-loop 
neural network structure, the general characteristics of the signal, and the multilayer neural 
network adjustment algorithm are given. Several problems must be solved for the quality 
estimation of the closed-loop multilayer neural networks. 

The first one is the selection of initial conditions for the adjustment of the multilayer neural 
network weighting coefficients. Two methods of the initial condition selection are consid- 
ered: random selection with the averaging of the results across the number of random in- 
jections and search of all local and global extrema, and a deterministic method with placing 
the neural network into the region of a global extremum of the secondary optimization 
functional by means of defining some piecewise linear divisional surface at the initial stage. 

The second problem is the selection of the typical input signal class for the multilayer 
neural network for the estimation of their functioning quality in the transient and sta- 
tionary modes. The complexity of the input signal will be particularly determined by the 
modality of the conditional distribution f’(x/e). 

The third problem is the selection of the parameter matrix K" in the algorithm of 
the extremum search for the secondary optimization functional. This problem can be 
solved analytically and by means of statistical modeling methods. The general analyti- 
cal methods for the closed-loop neural networks consist of the following steps: 


1. Determination of the density of probability distribution for the estimation of the 
secondary optimization functional gradient vector; 

2. Derivation of the stochastic differential equation for the change of the distribution 
density of the adjustable neural network coefficients in the adjustment process; 

3. Solution of this equation; 

4. Search of the primary optimization functional distribution parameters by means of 
integrating over the feature space and over the neural network state space. 


With the result of the analysis according to the aforementioned steps and according 
to the requirement to provide the given quality estimated by the primary functional 
value, one can solve the problem of synthesis of the neural network adjustment circuit. 
Notice that the analytical solution of the third step is rather complicated. That is why 
such methods are illustrated in the present study only by some particular examples. 
The statistical analysis is considered to be the main one. 
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12.2 
Investigation of the Neuron Under the Multi-Modal Distribution 
of the Input Signal 


12.2.1 
One-Dimensional Case - Search Adjustment Algorithm 


The neuron with two solutions and with minimization of 0, was modeled. The block 
diagram of the modeled system is shown in Fig. 12.1. The possibility of the design of 
closed-loop systems with the search adaptation procedure was analyzed. The assem- 
blages of the first and second class patterns have multi-modal distributions. This is the 
case of the system structure insufficiency. In such a case, the structure complexity is 
less than the complexity of the solved problem, and therefore the potential recognition 
quality cannot be achieved in principle. 

Figure 12.2 represents distribution densities of the first and second class assem- 
blages and the dependence of the average density function @,, upon the threshold a, 
in the case when the left neuron indicates the first class region, and the right neuron 
indicates the second class region. The gradient @,, in the search procedure was calcu- 
lated according to the following expression: 


da, = a, (ay + Aap) — Og (ay — Ady) 
day 2Ado 


a(0) 


Fig. 12.1. Block scheme of the search neural network adjustable in the closed cycle with minimization 
of the discrete error second moment: 1 - square-law function generator; 2 - T-cycle delay unit 
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Fig. 12.2. f(x) 
The input signal and optimi- o=1 
zation functional characteris- 
tics: I - the first class; II - the 
second class 
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where Aa, is the amplitude of the search oscillations. The estimation of doy,,/da) was 
performed by means of averaging across m,, realizations of the system input signal. 
The main aim of the modeling is the estimation of the influence of Aap, K, M,» Ay(0) 
upon the dynamics of the adjustment circuit of the coefficient a). The results of mod- 
eling are the following: 


1. The search oscillations are suitable for the design of the neural network closed- 
cycle adjustment block. The higher value of Aa, results in the higher precision of 
the adjustment circuit performance in the stationary state (Fig. 12.3); 

2. The higher value of K" results in the lower value of the systematic error for the it- 
eration procedure of the optimal solution search and in the higher value of the 
random error of this procedure (Fig. 12.4); 

3. The higher value of m,, results in the lower value of the random errors and in the 
higher values of dynamic errors in the adjustment circuit (Fig. 12.5); 

4. With any initial conditions a)(0) (Fig. 12.6), the iteration procedure of the optimal 
solution search converges to one of the local extrema. The results of the algorithm 
performance with the introduction of the random elements into the search proce- 
dure are represented in Fig. 12.7. 
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Fig. 12.3. 

The investigation of the step 
value influence upon the sys- 
tem adjustment dynamics 
when K = 0.5; m,, = 20; 

a,(0) = 0: 1 - Ady = 0.25; 

2 - Aay = 0.5; 3 - Aay= 1 
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Fig. 12.4. ay wm 
The investigation of the K" value 
influence upon the system 
adjustment dynamics when 
Ady = 0.25; m,, = 20; ay(0) = 4: 
1- K"= 0.25; 2- K"= 0.5; 
3-K'=1,4-K'=2 
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Fig. 12.5. 

The investigation of the sys- 
tem adjustment block memory 
m,, influence when Aa, = 0.25; 
K*=0.5; ay(0) =7.32: 1 - m,,=5; 
2-m,,= 10; 3- m, = 20 


0 100 200 300 400 


12.2.2 
Multidimensional Case - Analytical Adjustment Algorithm 


The analytical adjustment procedure in the case of multi-modal input signal distribu- 
tion was investigated in the example of @,, minimization in the neuron with a solution 
continuum (Chap. 1) and arc tangent activation function (B= 10). 

The following problems were analyzed experimentally: 


1. The influence of the initial conditions on the convergence of the iteration procedure 
at the search of one local extremum; 
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Fig. 12.6. 

The investigation of the initial 
conditions influence upon the 
system adjustment dynamics 
when Aa)=0.25; K=10; m,,= 10: 
1 - a,(0) = 0; 2 - a,(0) = 3; 

3 - a,(0) = 4; 4- a,(0) = 7; 
5-a)(0) =9 
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Fig. 12.7. The system adjustment dynamics under the set of random initial conditions: 1 - Aap = 0.25; 
K = 0.5; m,, = 10; 2 - Aap = 0.25; K = 0.25; m,, = 20; 3 - Aa) = 2; K = 2; m, = 10 


2. The influence of the step value and feature space dimensionality on the convergence 
rate of the iteration procedure. Stability of the gradient procedure. The influence of 
variance value on the quality of the iteration process convergence; 

3. The influence of the gradient calculation method on the search process quality; 

4. The influence of the system adjustment block memory m,,. 


The investigation was carried out with the help of the random-vector generator x 
and teacher instruction €. The multi-modal distribution of the random vectors is shown 
in Fig. 12.8. The circles indicate the level of equal values for probability density of each 
mode (solid lines — first class patterns, dashed lines - second class patterns). The whole 
number of mode was Z = 10, and root-mean-square value for one mode was o= 2. 
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Fig. 12.8. i, XO 
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The first experiment was aimed at testing the stability of the neuron coefficient vector 
in the optimal state. For this purpose, the optimal initial conditions corresponding to 
one of the local extrema were given to this vector (positions J and Ja in Fig. 12.9) and 
the learning process started. The initial (1 and Ja) and final (1’ and 1’a) hyperplane 
positions show the stability of the extremum corresponding to one of minimums of 
the average risk function. Oscillations of the relatively stable positions are the result of 
the stochastic properties of the minimized neural network quality functional. The 
deviation from the optimal position with rotational displacement 3 and without it 2 
results in the hyperplane displacement into the nearest local minimum 2’ and 3” Fig- 
ure 12.9 shows initial (1, 2, 3, 4) and final (14 24 34 4’) divisional hyperplane positions 
for different initial conditions. 

Figure 2.10 shows the line (hyperplanes in the general case) coefficient adjustment 
dynamics under multi-modal input signal distribution. It was convenient in this case 
to use the intercept form of the equation of a straight line (Fig. 12.10) and trace these 
intercept length changes. It is seen that under the optimal initial conditions (J, 2), the 
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Fig 12.9. 

Neuron coefficient adjustment 
dynamics under multi-modal 
input signal distribution and 
m,, = 30 (numeric characters 
are the numbers of experi- 
ments): dashed lines are the 
initial hyperplane positions; 
continuous lines are the final 
hyperplane positions 


Fig. 12.10. 

Neuron coefficient adjustment 
dynamics under multi-modal 
input signal distribution: 

1-4 are the numbers of experi- 
ments 


system oscillates slightly around the optimal position. The large oscillations of line 3 
are caused by the large values of the functional gradient. This is the property of the 
points close to the local extremum. The points far from the local extremum are char- 
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acterized by the small gradient values, and therefore its movement is slow. Consequently, 
it is necessary to know some a priori information about the quality functional (limi- 
tation of the search space, expected characteristics of the extrema location, etc.) in 
order to select the proper search intervals and initial steps of the gradient procedure. 

Interesting results are obtained in the investigation of the variance value (the level of 
class quality) on the adjustment process. If the variance is small relative to the distances 
between modes, then the optimal position of the divisional surface is not significant be- 
cause the classes do not overlap and the local extrema are not sharp. The experiment with 
the first class variance being several times greater than the second class variance showed 
that the optimal divisional hyperplane position shifted to the mode with the smaller 
variance. This could be expected for the system adjusted by the average risk function. 

The stability of the gradient procedure is achieved by the experimental selection of 
the step value and constraint upon the vector component increments. The component 
increments could not be more than the fourth part of the distance between local ex- 
trema, and the learning procedure was smooth. 


Fig. 12.11. 

Neuron coefficient adjustment 
dynamics under four modes 
of the input signal distribu- 
tion: 1 is the first minimum; 
2is the second minimum 
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Two pairs of curves for the search dynamics of two minimums under four modes 
of the input signal distribution are represented in Fig. 12.11. The initial optimal gra- 
dient procedure step was equal to 4, and constraint for Aa; was equal to 0.03. Interest- 
ingly, the neuron coefficients changed the sign in the adjustment process. The image 
point for the first minimum search transfers from the second to the first quadrant of 
the adjustable coefficient space, and the image point for the second minimum search 
transfers from the third to the first quadrant. 


12.3 
Investigation of Dynamics for the Neural Networks of Particular Form 
for the Non-Stationary Pattern Recognition 


This section deals with a one-dimensional neural network and a,, minimization 
(Chap. 9). The aim of the investigation was the estimation of different system charac- 
teristics upon the closed-loop adjustment circuit functioning. 

The expression for the analogous error of the system has the following form: 


Xe (nAT)=e(nAT) x(nAT) | aq (nAT) (12.1) 


Consequently, 


x2 (nAT) = €* (nAT)—x* (nAT)+a9 (nAT)—2€(nAT)x(nAT) 


+2ag (nAT) e(nAT)—2a9(nAT)x(nAT) 


The lines over the expressions mean the averaging performed at the time instant 
nAT across the set of implementations of the nonstationary random process. Since we 
have only one implementation of the nonstationary random process x,(nAT) then the 
averaging across the set must be substituted by the averaging across the time on the 
memory interval m,, with additional constraints according to the a priori information 
about the characteristics of parameter changes for nonstationary random process dis- 
tribution across the memory interval. The most suitable in this case is the representa- 
tion of the random process in the form of the sum of the stationary process and determin- 
istic process with known characteristics of its changes in the functional form [12-1]. 

Since the derivative 


dx2(nAT) 
day (nAT) 


cannot be expressed in the algebraic form then let us assume that the value a)(nAT) is 

fixed on the averaging interval m,,. The change of a)(nAT) is performed in the adap- 

tation mode with the cycle equal to the memory m,, of the system adjustment block. 
Consequently, 
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The closed-cycle adjustment algorithm has the following form: 


AT 


n 


a =a +K’x,(nAT) (12.2) 


[2m ar 
m 


n 


In the case of nonstationary patterns, the process x,(nAT) is a nonstationary one 
with characteristics determined by the input signal nonstationary characteristics 
(Chap. 9). The task to determine the average 


Xq(nAT) 


is the classical problem of filtering of nonstationary discrete random processes 
[12-3 to 12-7]. Methods of recurrent implementation of the optimal discrete filters 
[12-6] were used for the given neural network modeling. 

If m,,= m= const, then for each n and AT = 1 in this case 


n n at 
a +m, |= 4a +K S>W(i,n)x,(ni) 
uh Hh i=0 


where W(i,n) is the optimal impulsive admittance function of the estimation filter 


xXq(nAT) 


Below we use for W(i,n) the expressions from [12-6, 12-7]. 
The dependence of the following input signal characteristics upon the dynamic of 
the closed-cycle adjustable system were investigated: 


1. Time course of the pattern assemblage mathematical expectations (the assemblages 
of both classes are assumed to be equal); 

2. Level of class intersection determined by the variance equal for both classes under 
the fixed difference between the mathematical expectations of pattern assemblages 
of the first and second classes); 

3. Nonstationary level determined in particular by the change rate of the class centers’ 
coordinates; 

4. Memory value m,, in the block of the system closed-cycle adjustment; 

5. Prediction time @ in the block of the system closed-cycle adjustment at the estima- 
tion of the secondary functional gradient; 

6. Amplification coefficient K" in the block of the system closed-cycle adjustment. 


Figures 12.12-12.19 show the time courses of the neuron threshold changes under 
the linear laws for the changes of the class centers’ coordinates. Two laws with different 
change rates of these coordinates were used: (2t + 3) and [(1/2)t+ 3]. The distances 
between class centers is fixed in all the experiments. 

Groups of curves I and II correspond to the two aforementioned linear laws. The 
data analysis results in the following conclusions: 


1. The increase of the memory value m,, of the recognition system results in the de- 
crease of the class intersection level influence upon the adjustment random error; 

2. The increase of m,, results in the increase of the systematic error in the coefficient 
adjustment (Figs. 12.14, 12.15); 
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Fig. 12.12. 800 + ao 
Dynamics of the system closed- 

cycle adjustment under the 
nonstationary pattern recogni- 

tion at o= 3; @=0; K'=-0.1: 
—+—m= 20; ----- m = 3; 400 
ideal threshold value 


Fig. 12.13. 800 +4 
Dynamics of the system closed- 

cycle adjustment under the 
nonstationary pattern recogni- 

tion at o= 10; @=0; K°=-0.1: 
—"—m=20; ----- m=3; A400 
ideal threshold value 


Fig. 12.14. 
Dynamics of the system closed- 800 
cycle adjustment under the 
non-stationary pattern recog- 

nition at K° = -0.5; m = 3: 

----- a= 2, o= 1 (ideal 


threshold value); ----- a@=10, 400 
o= 0.5; " a@=20, 0=5; 
—a=40,0=5 

Fig. 12.15. 


Dynamics of the system closed- 800 
cycle adjustment under the 
nonstationary pattern recogni- 

tion at K* = -0.5; m = 20: 


—-—o0=1,a4=2;----- o=5, 
a= 10; o=5, H= 20; 
bois o=5, @=40 400 
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3. The adjustment process is unstable under the small values of m,, (m,=5) and 
K" = -2. The increase of m,, up to m,,= 20 results in the stable adjustment process. 
Consequently, the nonstationary pattern recognition systems require the memory 
in the adjustment block (m,,> 1). The increase of m,, compensates to some extent 
for the lack of the a priori information about K’: 

4. The rate of the classes’ center coordinate changes in time does not practically influ- 
ence the errors of the adjustment circuit performance; 

5. The requirement K’< -1 is necessary for the stability of the adjustment circuit; 

6. The characteristic modulation of the enveloping curve for the system threshold 
changes under the unstable conditions is observed; 

7. The results of the use of the quadratic law instead of the linear law for the classes’ 
centers’ coordinate changes in time confirm the previous conclusions (p. 1-6). Under 
the sufficiently large values of m,, the systematic error changes of the adjustment 
circuit show some regular relationship (negative for K’ > -1 and positive for K’<-1). 


Figures 12.17-12.19 show that the level of class intersection significantly influences 
the adjustment circuit in the self-oscillatory adjustment process when K’=-2. The 
adjustment process diverges under the large values of o. But under the small o, the 
oscillating adjustment process periodically changes its amplitude relatively ideal thresh- 
old value and at some time moments become rather precise. 

The experiments with solution prediction on the time interval a (Figs. 12.14, 12.15) 
showed the following: 


Fig. 12.16. 800 + 40 “ 
Dynamics of the system closed- oe } | 
cycle adjustment under the 
nonstationary pattern recogni- 
tion at o= 5; @=0; m= 20: 
—.— K*=-0.5; ----- K 400 + 
= -0.75; —— K’ = -2 (ideal 
threshold value) 
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Fig. 12.17. 
Dynamics of the system closed- 
cycle adjustment under the 800 


nonstationary pattern recogni- 
tion at o=5; @=0; m=5: 


—+—K*=-0.5;----- K* 
= -0.75; —— K*= -1 (ideal 
threshold value); °°°°° K*=-2 400 
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1. The increase of a results in the increase of the random error in the closed-cycle 
adjustment circuit; 

2. The increase of 0, decrease of m, and fixed o result in the increase of the random 
error in the closed-cycle adjustment circuit; 

3. The results of comparison between the linear and quadratic laws for the classes’ 
centers’ coordinate changes in time showed that the random error in the closed- 
cycle adjustment circuit increased in the former case. 


12.4 
Dynamics of the Three-Layer Neural Network in the Learning Mode 


The considered neural network is supposed to have a continuum of solutions. The first, 
second, and third neural network layers consisted of 3, 2, and 1 neurons, respectively. 
The feature space was multidimensional in the general case and was two-dimensional 
in the particular case. The open-loop neural network was described by the following 
expression: 


2 Hj oe) Hy 2 Ho 
B) 
Xhy alee > Anshy eo > Ant Pe ‘ye Ay ho x? 


hy=1 hy=1 ho=0 ho (12.3) 
la ie ce da ‘ie 
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Dynamics of the system closed- 
cycle adjustment under the 
nonstationary pattern recogni- 
tion at o= 3; @=0; K'=-2: 
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threshold value; °°°°° m=5 
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Fig. 12.19. 
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cycle adjustment under the 
nonstationary pattern recogni- 

tion at o= 10; @=0; K*= -2: 

ideal 


500 


236 Chapter 12 - Analysis of Closed-Loop Multilayer Neural Networks 


The expressions for the 0%, gradients estimations have the following form: 
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The equal value lines for the first and second pattern class distributions are repre- 
sented in Fig. 12.20. The state of the neural network is determined in the following way. 
Three neurons of the first layer have respectively the coefficients a,) = -12,a,, = 1,4a,,= 1, 
An = 24, Ay, =-1, ay, =~-1, Ay = -36, a3, = 1, and a;,=1. Neurons of the second layer 
have the coefficients a{)=0, a}, = 1,4},=1,a}3;=1,45)=0,a3,=1,a},=1,and aj,=1. 
Neurons of the third layer have the coefficients a’j,=0, a”j)= 1, and aj, = 1. 

In the experiments, the input feature space dimensionality was N = 2, and the num- 
ber of modes f(x) was 4. 


Experiments with the first neuron layer (the second and the third layers are optimal). 
The conditions in each experiment were the following: 


a X2 


x 


L 
0 3. 6 9 12 15 18 21 24 


Fig. 12.20. Initial and final positions of divisional surfaces realized by the neurons in the experiment 1- 
3: a the first layer; b the second layer 
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1-1. The aforementioned hyperplane coefficients are optimal; 

1-2. Parallel shift of hyperplanes realized by the neurons of the first neural network 
layer (the initial coefficient: 1, 1, -8, -1, -1.2; 1.1, -32); 

1-3. Parallel shift of two hyperplanes realized by the neurons of the first neural net- 
work layer in different directions (the initial coefficient: 1.1, -8, -1, -1.24; 1.1, -40). 


Experiments with the second neuron layer (the first and the third layers are optimal). 


2-1. Rotation of one hyperplane realized by the neurons of the second neural network 
layer by the angle @=7 (the initial coefficient: -1, -1, -1, 1.1, 1); 

2-2. Rotation of two hyperplanes realized by the neurons of the second neural network 
layer by the angle @=T. 


Experiments with the third neuron layer (the first and the second layers are optimal). 


3-1. Rotation of the hyperplane realized by the neuron of the third neural network 
layer by the angle a@=T. 


The following results were obtained. 

Figures 12.21-12.23 illustrate the coefficient adjustment procedure. The vertical axis 
represents the coefficient values, and the horizontal one represents the number of it- 
erations. The coordinate axis level corresponds to the optimal coefficient ratios. Ex- 
periment 1-1 confirms the stability of coefficient values in the optimal state (small 
deviations of their values from the optimal one at the sufficiently large number of it- 
erations). In experiments 1-2 and 1-3, the gradient procedure provides such an adjust- 
ment that the divisional surfaces reach the optimal position after 25-30 iterations. 


Fig. 12.21. aj 
Coefficient adjustment dy- 
namics in experiment 1-2 (the 


al 
n 
; ORs 12 
number of iterations is 50) at 11t 
m, = 50; K*= 0.1; K, = 0.01; gE 1 
K,=0.1;and K; = 0.1; K,, K,, n 


K; are the weighting coeffi- 
cients to K* for the neurons 
of the first, second, and third 
layers: 1 - the first neuron; 
2-the second neuron; 

3 -the third neuron 


238 


Chapter 12 - Analysis of Closed-Loop Multilayer Neural Networks 


Fig. 12.22. 
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Fig. 12.23. Coefficient adjustment dynamics in the experiment: a experiment 2-2; b experiment 3.1; 
1 - the first neuron; 2 - the second neuron; 3 - the third neuron 
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The results of experiment 1-4 are rather interesting. The initial conditions provided the 
optimal positions of the surfaces realized by the neurons of the first, second, and third 
layers in such a way that the “inverse” classification took place. The adjustment resulted in 
the parallel shifts of the planes, though the rotation by 180° is also possible. The divisional 
planes realized by the neurons of the second and third layers are drawn through the origin 
of coordinates, i.e., only the rotation of these planes around the origin of coordinates is 
possible in the adjustment procedure. Therefore, the experiments with the neurons of 
these layers included the adjustment of coefficients for the planes turned through 180°. 

Figure 12.23 shows coefficient adjustment dynamics in experiments 2-2 and 3-1. The 
adjustment procedure resulted in the turn of the hyperplane realized by the neurons of 
the third layer through 180°, thus taking an optimal position. After the adjustment of the 
third layer neurons, the coefficients of the first layer neurons also take the optimal values. 

The results of the performed experiments confirmed the theoretical analysis of the 
investigated adjustment algorithm and demonstrated its high efficiency. The following 
wide range of problems remains to be analyzed: 


1. Selection of the optimal coefficients K in the gradient procedure and their relation- 
ship between multilayer neural network layers; 

2. Analysis of the influence of the multilayer neural network structure redundancy 
upon the adjustment quality. 


12.5 
Investigation of the Particular Neural Network with Backward Connections 


We consider here a one-dimensional neuron with the feedback described by the fol- 
lowing relationships: 


y(nAT) = sign|g(nAT)|; g(nAT) = x(nAT)—ay(nAT)+ a;,(nAT) x, |(n—1)AT] (12.4) 


where AT is the time interval between the presentation of the input patterns. The 
minimum @,, criterion is taken as the criterion of the secondary optimization. It is 
assumed that the coefficients % and o, do not change in the interval of averaging m,, 
during the closed-cycle adjustment. The expressions for the a, gradient estimation 
have the following form: 


2 ms 
Oxg(nAT) aa (nAT)™” 
Oa a 
2 —— 
Oxg(MAT) __ 95 ya —DAT|™ (12.5) 
Oa, 


The expressions (12.4) and (12.5) form the base for the design of the corresponding 
closed system. The averaging of the gradient measurements was performed with the help 
of the optimal discrete filter. The a priori hypothesis about the change of the input signal 
mathematical expectation (stationary, linear, quadratic, etc.) was used for its synthesis. 
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Fig. 12.24. 

Dynamics of adjustment 

of the neuron with feed- 
back (m = 20; K*= -0.5): 
ideal case (a) = 2t + 3); 
———— = without feedback; 
mene positive feedback at 
K; = 0.5 and negative feed- 
back at K/= 0.5; —x— posi- 
tive feedback at Kj = -0.5; 
----- ideal case (a) =0.5t + 3); 
—.--—without feedback and 
positive feedback at K/= 0.5; 
-— negative feedback at 
K, = -0.5; —x x— positive 
feedback at Kj = -0.5 


Fig. 12.25. 

Dynamics of adjustment of 
the neuron with feedback: 
ideal case (a) = 2t + 3); 
<= without feedback; 
--e-H positive feedback at 
K, = 0.5 and negative feed- 
back at K, = 0.5; —x— posi- 
tive feedback at K/= -0.5; 
ideal case (ay = 0.5t + 3); 
—--—without feedback; 
-a-e-- positive feedback at 
K; = 0.5 and negative feedback 
at Kj = -0.5; —xx— positive 
feedback at K;= -0.5 


Fig. 12.26. 

Dynamics of adjustment 

of the neuron with feed- 

back (m= 5; K*=-0.5): 

ideal case (ay = 2t + 3); 
seeee without feedback; 
----- positive feedback at 
K; = 0.5 and negative feed- 
back at Kj = -0.5; —x— posi- 
tive feedback at Kj = -0.5; 
—--—without feedback; 
----- negative feedback at 
K; = -0.5 and positive feedback 
at K/= -0.5; — xx — posi- 
tive feedback at K; = -0.5; 
ideal case (a) = 2t + 3) 
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The analysis of the experimental study for the neuron with feedback partially rep- 
resented in Figs. 12.24-12.29 shows the following: 


1. The introduction of the positive or negative feedback into the open-loop system 
(K’>0 or K*<0 in the coefficient a, adjustment circuit) gives the same results of 
the system adjustment for the value a) and total threshold ay = ag(n) + a,()x;,(n-1), 
but oppositely signed and equal by the absolute value coefficients a;,: 

2. At the sufficiently large value of the system memory in the adjustment block (about 
m,, = 20), the change of ay and a, has the form of oscillations. At the decrease of the 
memory value (to about m,,=5), the oscillations of the adjustment process for the 
coefficient a, sharply increase, and for the coefficient a) - decrease; 


Fig. 12.27. 

Dynamics of adjustment of 
the neuron with feedback: 
ideal case (a) = 2t + 3); 
aase= without feedback; 
mse positive feedback at 
K,=0.5 and negative feedback 


at Kj = -0.5; ideal case 
(a) = 0.5t + 3); —--—with- 
out feedback; - ---- positive 


feedback at Kj = -0.5 and nega- 
tive feedback at K,; = -0.5; 

— xxx — positive feedback 

at K/= -0.5 


Fig. 12.28. 

Dynamics of adjustment of 
the neuron with feedback 

(m = 20; K*= -0.5). For the 
ideal case (ay = 2t + 3): 

x— positive feedback at 
K;= -0.5; —-—negative feed- 
back at K/=0.5;-- + -- posi- 
tive feedback at Kj = -0.5; For 
the ideal case (ay = 0.5t + 3): 
— xx — positive feedback 

at K{=-0.5;----- posi- 
tive feedback at K, = 0.5; 
—--— negative feedback 

at K/= -0.5 
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Fig. 12.29. 
Dynamics of adjustment of 
the neuron with feedback 
(m = 5; K*=-0.5). For the 
ideal case (ay = 2t + 3): 

x— positive feedback at 
K;= -0.5; —-— negative 
feedback at K; = -0.5; 
sone positive feedback at 
K; = 0.5; For the ideal case 
(ay = 0.5t + 3): —Xx— posi- 
tive feedback at Kj = -0.5; 
--— negative feedback at 
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3. The systematic error of the coefficient a) adjustment increases as the system memory 
increases or at the introduction of the feedback into the open-loop structure of the 
system. The adjustment systematic error for the total neuron threshold is practi- 
cally zero. This is an advantage of the neural network with feedback over the neural 
network without feedback; 

4. The decrease of the coefficient Kj in the iteration procedure for the adjustment of 
the feedback gain results in the approximation between characteristics of the neu- 
ron with the feedback to the neuron without feedback. 


12.6 
Dynamics of One-Layer Neural Networks in the Learning Mode 


Three types of the neural networks in the self-learning mode are considered below: 
the neural network with the search of the distribution mode centers f(x); the neural 
network in the form of the neuron layer with two solutions; the neural network in the 
form of the neuron with K, solutions. 

The main goal of the study is the quality estimation of the developed algorithms 
when the input signal x(7) has the arbitrary modality distribution. 


12.6.1 
Neural Network with the Search of the Distribution Mode Centers f(x) 


The self-learning algorithm realizing the following recurrent relationship (Chap. 9) 
b(xy,n+1)=b(xz,0)+K_ | x(n) —b(xz,n)| (12.6) 


is considered. The algorithm consists of the following stages: 
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1. Coordinates of the K,-dimensional vector b(x;,0) are randomly selected in the given 
interval of variation of x; 

2. The successive pattern x is presented to the input. The center b closest to x is calcu- 
lated; 

3. This coordinate of vector b(x,) is changed according to (12.6); 

4. The internal cycle is locked on p. 2. Then the external cycle is locked on p. 1. The 
random input signal x(n) distribution is the sum of the normal distributions with 
the given variances and mathematical expectations equal to 2, 4, ..., 16. The number 
of distribution modes f(x) was fixed from 2 to 8. 


Figure 12.30 shows the typical change of class center coordinates b(x;,,n) in the 
adjustment process for some variant of the random initial conditions. The algorithm 
functioning results are represented in Tables 12.1 and 12.2. In the tables, iis the num- 
ber of mode in the distribution f(x), Z is the modality of the distribution f(x), and j is 
the number of cycle for the injection of the random initial conditions b(x,,0) in the 
extremum R search. Table 12.1 was calculated at K,=5, M = 300 (the number of itera- 
tions by n),and K’ = 0.02. For each o here, the number of distribution f(x) modes found 
for each j is given in the right column. The number of modes found during all the 
previous cycles is given in the left column. The similar data are represented in Table 12.2 
for Z K, K*=0.01, M= 100, and o = 0.5. The analysis of the obtained results shows 
the following: 


1. The algorithm is efficient at the solution of a rather complex self-learning task; 

2. The experimental results confirm theoretical conclusions (Chap. 9) concerning the 
expected local and global function extremum search; 

3. The increase of o results in the decrease of the algorithm functioning quality under 
the fixed Z, K,, K’, i,j. 


The considered algorithm was slightly modified (the set of initial conditions in terms 
of class center coordinates was substituted by the set in terms of coordinates of the 
initially presented Z patterns). The respective quality increase is illustrated by the data 
shown in Table 12.3. Here the number of distribution modes obtained at each A-th step 
of the initial condition injections and the total number of distribution modes obtained 
during the total number of A steps are represented. The search was performed using 
both modifications of the described algorithm. The experiment parameters were the 


Fig. 12.30. b(x,.n) 
Typical change of class center 8 
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Algorithm functioning results; 


Table 12.1. 
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i= 1,2,34,5,6,7 (= 123A,5,6,7,8 


1,2,3,4,5,6 


i 


Algorithm functioning results; 


Table 12.2. 
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o=0.5 
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0.5, f(x) coefficients b,-b, amounted respectively to -9.1, -7, 


0.02, o= 
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following: K 


-3 ,-5, -1, 3.13. The X space was the interval [-11, 5]. 
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Table 12.3. Calculation with modified algorithm 


Modes i Number of 
modes Z 
1-4 4 
1-5 5) 
1-6 5 
1-7 if 
1-8 8 
12.6.2 


Neural Network with N° Output Channels 


We consider here the neural network in the form of the neuron layer with character- 


istics N=1, N’=3. 
Figure 12.31 represents the structure of this system. In this case, 
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according to Fig. 12.32, Table 12.4, and the following expressions: 
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993 — 402 
2 


It follows from (12.7) that 


bg = 93 — 


Ob ay, 
Oy; OA; 


oy’ = 2[x(n) b(x,n)| 


12.8 
O45; ( ) 


where dy,/da);= -1 according to Fig. 12.31. Vector db/dy; is calculated in the following 
way. Table 12.4 can be represented in the form of Table 12.5. 
Consequently, 


Ob(y) Obly) bly) 
Oy, Oy2 Oy; 


=[b,—b, b3—by, by —bs] 


The final algorithm for the coefficient adjustment has the following form: 


agi(n +1) =ap;(n)+ Ks [x(n)—b(y, 7) 22) 


i 


(12.9) 


y=y(n) 


Vector b(y,n) is calculated either according to the expressions described above or 
(in the case of the more complex structure) according to the recurrent expression 


Fig. 12.31. 
Block diagram of the neuron M1 
layer (N = 1) > 
y2 
x(n) > 
¥3 
> 
Fig. 12.32. b, b, b, by 
To the class center coordinate t O t oO t O a 
calculation a,, a, , a,, 
Table 12.4. 
Values of b(y) of a neural y bly) 
network b, b, b, b, 
y 24 +1 +1 4 
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described in the previous section. The experimentally investigated algorithm consists 
of the following stages: 


e 


Pattern x(m) enters the system input; 

The initial values of the adjustable parameters a); (i= 1,2,3) are selected in a ran- 
dom way in the interval of x variation; 

The values b,,..., b, are calculated by the values ay; 

The component of vector b(y) that is closest to x() is selected; 

The corresponding value is selected from vector db/dy;; 

The system coefficients are adjusted according to p. 1,2,4,5 and expression (12.9); 
Pattern x(n + 1) enters the system input, and the adjustment process continues start- 
ing from p. 3. 


Xv 


ON Ot ee 


Figure 12.33 shows the illustration of the system coefficient adjustment dynamics 
under some initial conditions (modes coordinates are 3, 5, 7; solid line — one variant, 
dashed line - another variant). 
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Fig. 12.34. 

Results obtained under the dif- 
ferent initial conditions and the 
same input sample length M 


Fig. 12.35. 

Results obtained under 

the same initial conditions 
and different input sample 
length M: M, = 150; M, = 300; 
M, = 450; M, = 600 


yx 


n=M, 


Figures 12.34 and 12.35 show some results of the algorithm performance obtained 
under the different initial conditions and the same input sample and some results 
obtained under the same initial conditions and input samples of different lengths. Thick 
lines link the classes’ center coefficients at the adjustment process start (n =0) and 
finish (n = M). 


12.6.3 
Neuron with K, Solutions 


In this case, 
i 
air, LS ‘sign(g—aj,j41)+1| 
j=l 


g(n) =a,(n)x(n)—ag(n) 
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Similar to the previous case, 


ab(y) Oy 
Oy Oa 


y=, Pex -afe—1Gy) 


Using the expressions for dy/da (Chap. 9), one can obtain the recurrent relation- 
ships for the design of the corresponding system adjustment algorithm in the learning 
mode: 


a(n +1) = ao(n) Ko [x(n)—b(y)] y ; Ko >0 
a(n+1)=a,(n) el [x(n) b(y)| a sign x 


The adjustment algorithm consists of the following stages: 


1. The initial values of the adjustable parameters a, and a, are selected in a random 
way in some given interval; 

2. The current threshold values are calculated according to the open-loop system struc- 
ture and coefficients a) and a,: 


Gj, 41 +4(n) 


xj = 


3. The values b(y) are calculated according to the expressions 


2,3 — *1,2 


b= 1,2 — 2 
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4. Pattern x(m +1) enters the system input at the time instant n, and the y value is 
calculated; 

5. The corresponding values b(y) and db(y)/dy are calculated according to the 
y value; 

6. Coefficients a) and a, are adjusted according to the above expressions; 

. The procedure is repeated starting from p. 2; 

8. The procedure is repeated starting from p. 1, and results are averaged across the set 
of injection of the initial conditions. 
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12.7 
Two-Layer Neural Network in the Self-Learning Mode 


The two-layer neural network with four neurons of two solutions in the first layer and 
a neuron of K,=5 solutions in the second layer was considered initially. In this case, 


Hy= 
y= F(g)=1+5 . co a; sign 


=1 j=l 


Sault J 


1=0 


The following expressions obtained using the results of Chap. 9 form the basis for 
the design of the closed-loop two-layer neural network in the self-learning mode: 


a;(n-+1)=a (n)+K*|x(n x(n)—b(y ny] te sign Soy (n)) , j=b....5 
a; (n-1)=a;(n)+K*[x(n)—b(y,n)]- a ons ign|aj(n)x; (] =1,....Nsj=l...5 


b(y,n+1)=b(y, n)+K**|x(n)—b(y, n)| 


The experimental investigation of this algorithm showed the low convergence rate 
at the search of some local mode due to the use of the neurons with two solutions that 
“desensitize” the information about the secondary optimization functional gradient. These 
neurons were substituted by the neurons with a continuum of solutions. In this case, 
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my 


b(y,n+1)=b(y,n)+K [x—b(y,n)| (12.12) 


The realized adjustment algorithm consists of the following stages: 


1. The random or deterministic initial values of the classes’ center coordinates and 
adjustable coefficients of this two-layer neural network are fed into the memory of 
the system; 

2. The calculation of number of vectors db(y)/dy is performed; 

3. Pattern x enters the neural network input; 

4. The y value is calculated according to the obtained x and multilayer neural network 
state at the current time instant; 

5. The corresponding vectors b(y) and db(y)/oy are selected according to the y value; 

6. New values of the adjustable neural network coefficient class centers are calculated 
using the results of p. 3,5; 

7. When the next pattern enters the input, the algorithm according to p. 4-6 is re- 
peated; 

8. The algorithm is repeated according to p. 1-7 after the detection of the local extre- 
mum. 


Figure 12.36 shows the equal value lines for the distribution density f(x) used in the 
experimental investigation of this algorithm. The optimal values of the adjustable 
coefficients for the first layer neurons are a,,; = 9; @),= 15; a,3= 21; a,4= 27; a), = 1; 
Ay = 13 Gy3= 1; Gyg= 1; a3) = 15 Az) = 1; ag,= 1; and az4= 1. 

The neuron of the second layer with K, solutions must realize a logical function 
represented in Table 12.6. The elaboration of the correct solution y requires the correct 
formation of the corresponding intermediate value of the analogous output signal 
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Table 12.6. 


Logical function to be realized y d z e a 2 
by the neuron of the second y; “1 1 7 1 1 
layer with K, solutions 
Yo Z| 2 1 1 1 
Vy = = = 1 1 
Va =| =i a =| 1 


g(n) = y-0.5. The system of algebraic equations for the calculation of the optimal 
coefficients for the second layer neuron can be obtained on this basis: 


A, - 4,- 43- 44+2.5=0.5 
Q,- 4,- a3-a4+2.5=1.5 
a, + d,-4,-a,+2.5=2.5 
a, + a,+ d3- ag+ 2.5 = 3.5 
a,+a,+43;+4,+2.5=4.5 
Consequently, 
a, = a,=a3;= a,=0.5. 


The experiments with such a two-layer neural network were carried out according 


to the following plan: 


lon 


. Experiments with different distribution dispersions representing f(x) modes 


(Fig. 12.36); 


. Coefficients of the second layer neuron and the classes’ center values are optimal. 


The following different conditions were taken for the first layer neurons: 

a Neuron coefficients were optimal; 

b Initial values of neuron coefficients were taken with equal deviations from the 
optimal ones; 

c Initial values of neuron coefficients were taken with different deviations from 
the optimal ones. 


. The second layer neuron coefficients were optimal; then they were similar to p. 2a, 


2b but with the initial classes’ center values that were not optimal; 


. The first layer neuron coefficients and classes’ center values were optimal. The ini- 


tial second layer neuron coefficients were not optimal; 


. Similar to p. 3 but with initial classes’ center values that were not optimal; 
. The first and second layer neuron coefficients and classes’ center values were not optimal; 
. All the aforementioned experiments were performed under different but determinis- 


tic initial conditions. The final experiment was based on the random initial conditions 
for neuron coefficients and the classes’ centers. 
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Experiment p. la showed the stability of the two-layer neural network in the global 
extremum of a special average risk function. Figure 12.37 shows the results for the case 
when the initial neuron coefficients were optimal in both layers but the classes’ centers 
were not optimal. Figure 12.38 shows the results for the case when the first layer initial 
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the classes’ center coordinates 
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Investigation results for the 
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Fig. 12.39. 18 1 x3 
Investigation results for the 17 
two-layer neural network 16 
in the self-learning mode: 

I - initial hyperplane position; 15 
II - optimal hyperplane posi- 14 
tion; III - hyperplane position 13 
after 3000 iterations 
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neuron coefficients were not optimal but the second initial neuron coefficients and 
classes’ centers were optimal. It is seen that the selected location of the input signal 
distribution density modes makes the hyperplanes (J, IJ, III, IV) insensitive to the ro- 
tation. The quality functional is almost constant here, and this fact is confirmed ex- 
perimentally. It is therefore advisable to investigate only the dynamics of the threshold a; 
adjustment (j = 1, 2, 3, 4) under the given fixed f(x) mode location and, as a rule, not 
to investigate the dynamics of the hyperplane coefficients. 

Figure 12.39 shows the results for the case when the initial coefficients of the second 
layer neuron were optimal but the initial values of first layer neuron coefficients and 
the classes’ centers were taken with equal negative deviations from their optimal val- 
ues. The threshold adjustment dynamics is shown in Fig. 12.40 (solid line). The case of 
positive deviations is shown by the dot-and-dash line. Figure 12.40 (dashed line) and 
12.41 show the experimental results for the case when the initial coefficients of the 
second layer neuron were optimal but the initial values of first layer neuron coeffi- 
cients and the classes’ centers were taken with different centers by the sign and mag- 
nitude deviations from their optimal values. 

The goal of the experiment p. 6 was to investigate the influence of the distribution 
dispersion upon the recognition quality. The initial conditions in this experimental 
series were the same, whereas the dispersion o” was different (Fig. 12.42). 

These experiments show that the self-learning problem can be solved for o7 not 
more than 0%,,, = 1.5. This is reasonable because it is impossible to select locally con- 
centrated objects under the large o? (classes with a large intersection). Hence, the self- 
learning methods based on such a selection appear to be invalid under such conditions. 

Figure 12.43 shows the experimental results for the case when the initial coefficients 
of the first layer neuron classes’ centers were optimal but the initial values of the sec- 
ond layer neuron coefficients were not optimal. 
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Fig. 12.42. 
Investigation results for the 
two-layer neural network 

in the self-learning mode: 
----07=1;—o07=115; 
—-+— 07 =2;----- o7=25 
Fig. 12.43. 
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The adjustment procedure termination criterion in this experiment was the loca- 
tion of the curves inside the “tube” with a diameter of 0.2 (Fig. 12.43) and a length of 
5000 iterations. Figure 12.44 illustrates this experiment. 

Figure 12.45 shows the experimental results with another distribution density form 
as compared with the typical density form f,(x). 
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Fig. 12.44. 
Investigation results for the 
two-layer neural network in 
the self-learning mode: I, II, 
III , IV - numbers of the re- 
spective neurons of the first 
layer; - - - - optimal hyper- 
plane position; ini- 
tial hyperplane position; 
—- hyperplane posi- 
tion after 5000 iterations 


Fig. 12.45. 
Investigation results for the 
two-layer neural network 

in the self-learning mode: 
1-4 - numbers of the respec- 
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About Some Engineering Methods for the Selection of Matrix Parameters 
in the Multilayer Neural Network Closed Cycle Adjustment Algorithms 


It is unlikely to have enough information for making matrix K" non-diagonal if only 
the first derivative of the optimization functional is estimated in the process of the 
closed-cycle adjustment algorithm design. This matrix in the simplest case is a unit 
matrix multiplied by some constant or time-dependent coefficient. However, as the 
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above experiments have shown, there are some reasons to make this coefficient differ- 
ent for the adjustment of different layers of the multilayer neural network. As it was 
mentioned in Chap. 9, the main goal of the use of stochastic approximation methods 
is to provide the zero random and dynamic errors in determination of the adjustment 
coefficient vector in the stationary state. But the use of such methods results in the 
increase of corresponding dynamic errors in the transient state, i.e., in the adjustment 
mode. It is unlikely that it is necessary to provide the zero random error in the station- 
ary state during the multilayer neural network adjustment. Some finite adjustment 
coefficient distribution dispersion is admissible due to the relatively smooth proper- 
ties of the secondary optimization functional in its extremum point. This finite disper- 
sion of distribution f,(a) does not result in the significant increase of the secondary 
optimization functional, and it can be provided by the time-invariant matrix K’. Two 
engineering methods are possible in this case for the selection of the matrix K coef- 
ficients. The first one is based on the analysis of the a priori given problem complexity 
determined by the modality f,(x) under the fixed dimensions of the feature space. 

The second method based also on the analysis of experiments shows that the objec- 
tive necessity to estimate the secondary optimization functional emerges in the real 
situation during the multilayer neural network adjustment process. If the dependence 
of the secondary optimization functional upon n strongly oscillates, then it is neces- 
sary to decrease K’. If this dependence is sufficiently smooth, then it is necessary to 
increase K" in order to decrease systematic adjustment error up to the emergence of 
oscillations. The first method for selection of K* can be used for selection of the initial 
value of K* in the second method. 


12.9 
Design of the Multilayer Neural Network for the Matrix Inversion Problem 


Let us consider the design of the multilayer neural network and its closed-cycle adjust- 
ment algorithm for matrix inversion when its dimensionality is 2x2. Since the result 
also represents some matrix 2 x 2 then the multilayer neural network output must consist 
of four neurons with a continuum of solutions. The minimum variant of the open-loop 
three-layer neural network structure is shown in Fig. 12.46 in the form of a graph-scheme. 

The initial conditions on the adjustment coefficients of the first layer neurons must 
be selected in such a way that under B =<, four hyperplanes must divide the initial 
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feature space into the regions with equal hyper-volumes. The initial condition selec- 
tion for the second and third layers must be performed in a similar way as in the case 
of the first layer because the neurons with a continuum of solutions are also used here. 

The formation of the learning sample for the considered multilayer system is car- 
ried out according to the following expressions: 


x4 _ %2 
xy X2 = D D 
X= >; xX _— 3 D=X,X4 —X 2X3 
x3 X4 x30 XY 
D OD 


There are practically no constraints upon the amplitude of the multilayer neural 
network input signal. But the amplitude of the output signal is limited by the interval 
[-1, + 1] due to the output neurons’ specific character. This requires some normaliza- 
tion of the input signal in such a way that all the output signal components do not 
exceed the interval [-1, + 1]. Such normalization must be performed in the following 
way. Let X be the initial matrix and 


i= max {Ll} 


Division of X by x gives the matrix X~! with elements belonging to the interval 
[-1, + 1]. Let us designate 


4 %2 
x Xx 
Di= 
x x4 
x 
Then 
4 _%2 XX X4 Xy 
—1 |x x2] xD, xD 1 a ae ical ey 
Me ale, 1 i = a _ x x 
x3 X4 x3 x4 |f xD} x3 Xa) | xD] 23 
xD, xD, x xX x 7% 


Consequently, the multiplication of matrix X elements at the input by 1/xD, and the 
following application of this matrix to the inversion system provide the matrix 


1 


x 


X4 —X2 


—xX%3 0 XY 


with elements inside the interval [-1, + 1]. Multiplication of this matrix by 1/xD, gives 
the final result X~!. 
The structure of the open-loop multilayer neural network is described by the fol- 


lowing relationships: 
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The teacher instruction €,, for the multilayer system must be formed in the algorith- 
mic way using one of the known matrix inversion algorithms and inversion precision 
control. The expressions for the mean-root-square error of the elements of the matrix 
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These expressions form the basis for the design of the adaptation algorithm of the 
multilayer system aimed at the inversion of matrix 2 x2. 
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12.10 
Design of the Multilayer Neural Network for the Number Transformation 
from the Binary System into the Decimal One 


The transformation of the four-bit binary number will be considered as an example. 
The system must realize the desired relationship “input-output” described by the 
multiple-valued logic function €(x) represented in Table 12.7 after the termination of 
the closed-cycle adjustment mode. 

Table 12.7 provides the formation of the learning sample at the system input along 
with the teacher instruction € by selection of the table columns. 

The open-loop neural network is described in this case by the following expres- 
sion: 


Kp-l 
ie. 3 2 4 2 4 
x3=14+— > sign y Anshy =arctgB) > Ann, —arctg B > hy hy*hy ~Fkp skp +1 
ay et hy=l . i=l =0 
p 2 i ho 
Thy skp =k, —1; Kp=10 
Consequently, 
4 4 
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These expressions form the basis for the design of the adjustment algorithm of the 
multilayer system aimed at the number transformation from the binary system into 
the decimal one. 
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12.11 
Investigation of the Multilayer Neural Network under the Arbitrary 
Teacher Qualification 


The design of the optimal neural network model in the case of the arbitrary objective 
and subjective teacher qualification was described in Chap. 5. We considered below 
the case K = 2 and arbitrary objective teacher qualification bp. 

The pattern recognition system represented a two-layer neural network based on 
the neurons with arc tangent characteristics and B = 5. The adjustment algorithm was 
modeled in the learning (b, = 1) and self-learning (b, = 0) modes. The algorithm block 
diagram is represented in Fig. 12.47. 

The main aim of the experimental study was to test the system operation capability. 
The plan of experiments included two main points: 


1. The investigation of the system behavior under the optimal coefficient values and 
different relationships between by and b; 

2. The investigation of the system dynamics for different b, and b, and non-optimal 
neurons. 


The pseudo-random-number generator with the distribution close to the normal 


one and with equal covariance matrices for both classes was used as the generator of 
the input signals. The experimental study showed the following results concerning p. 1: 
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Fig. 12.47. Block diagram of the neural network with subjective teacher qualification: 1 - adder unit; 
2 - nonlinear conversion device; 3 - units of gradient calculations; 4 - multiplier unit 
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Fig. 12.48. ha 
Dynamics of the system coef- 
ficient changes under b,= 1 
and different b, 
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1. The oscillations of the system coefficients around their optimal values are observed 
in the case bh =b5 

2. The gradual detuning of the system takes place in the case b.= 1. The detuning 
increases as b, changes from 1 to 0; 

3. The system remains in the optimal state in the case b.= 0 independently of bo. 


The investigations under the non-optimal initial coefficients showed that in the case 
by = b, the adjustment procedure leads the system into the optimal state. In the case b.= 1 
and b,# bp, the system is not adjustable in spite of the long adjustment duration (Fig. 12.48). 


12.12 
Analytical Methods of Investigations of the Neural Network 
Closed Cycle Adjustment 


We describe below the general analytical methods for the investigation of the closed- 
cycle neural network adjustment. The particular examples are used for illustration. 
The complications of the described methods are discussed. 

The general methods of analysis of the closed-cycle neural network adjustment are 
similar by its structure to the methods used for the analysis of the open-cycle adjust- 
ment. They include the following stages: 


1. The analysis of the probability distribution density for the vector of the secondary 
optimization functional gradient estimation; 

2. The derivation of the stochastic differential equation for the change of the adjust- 
able coefficient distribution density in the course of the adjustment procedure; 

3. The solution of this equation; 

4. The determination of the probability distribution for the correct recognition by 
means of integrating across the feature space and across the neural network state 
space (the space of the adjustment coefficients). 


264 


Chapter 12 - Analysis of Closed-Loop Multilayer Neural Networks 


In principle, the problem of the parameter matrix K" selection must be performed 
according to the results of p. 3. However, it will be shown below that this task is rather 
complicated. In this section, we consider a linear threshold element with the minimum 
magnitude of the first discrete error moment optimization criterion. 

The recurrent relationship for the neuron with |@,,| minimization in the case 
N=m,,= 1 was obtained in the following form: 


a(n + 1) = a)(n) - K’x,(n) 
The first stage of analysis. This is the problem of the random walk across the one- 
dimensional grid. Such a walk is described by the Markovian chain with an infinite 
number of states. The probabilities of transition from the state mK to the state(m+1)K, 


(m-1)K’ and mK" are 
* * if * 
Plmk’ |(m-+1)K == [1-0 (mK | 
+ + 1 + 
P|mK" |(m—1)K =, (mk 


* * 


P\mK |mK 


1 * * 
Here @ is the cumulative detection probability. 


The second stage of analysis. The stochastic difference equation for the change of the 
probability distribution density of the threshold a, has the following form: 


Wrst (mx*)= W, 


(m px’ | ®,|(m DK" }} 
+ Wp (mk )-[0, (mk )+1—2,(mx")| +, 


(m+1K" |--@, (40K | 


The third stage of analysis. The solution of this stochastic difference equation is rather 
difficult. Let us consider the stationary state (n =c<). 
Taking a)(0) = 0 and proceeding to the limit n > < one obtains 


wm DK") ; {1 D, lon all +w|(m +0K"] 4, [on + | 
—w(mx") 1 @,(mK*) + @,(mK* } =0 


Consequently, 
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It follows from the normalization condition for the distribution density of coeffi- 
cient a, by m that W(mK’) = 0. Consequently, C= 0 and 


wim IK 


{2 om—D1} = nk” )2. nt’ 


Taking W(0) =, one obtains 


w(x*)=al-%©). wlox*)=wia eel) 
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In the general case, 


re gis 


_ Ike (12.13) 
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The value A = W(0) is determined from the normalization condition of the density W 
by m. Function W(.) represents the distribution density for the adjusted coefficient ay in 
the stationary state. Function 4[1 - © ,(mK’)] is a steadily decreasing function from 4 to 
0 in the interval -< < mA < +e. Function 42@®,(mK’) is a steadily increasing function 
from 0 to % in the interval -< < mA < +e<. Function [1 - ®,(mK’)] + ®,(mK’) has its 
maximum at the root of equation 


1-,[(mK’)] = ®,(mK’) (12.14) 


Let the root of Eq. (12.14) be mK" = 6. Then at mA> 6, 


1-@, (mx*) <®, (mx") <®, [on +K"| vie. 


@, (mK) - 
1—@,|(m—1)K"| 


Chapter 12 - Analysis of Closed-Loop Multilayer Neural Networks 


Respectively, at mK*< 0 one obtains 
@,(mK*)<1—@,(mx*)<1-@, (m—0)k"] 
2, (mx* ) 


—, — 781 
1—@|(m—1)K 


Therefore, if 6/ K* is an integer number, then 


w(@—A\=w(6 Te) ae 7) 
( ) ( =alé—a) ( 
w(6+A\=w(6 peale) 4, 6 
( ) ( I,(6+4) ( ) 


Consequently, @ is the mode of distribution of the threshold value as a random 
variable, and it provides the equality of the conditional risk functions for the pattern 
assemblages of the first and second classes. 

It follows from (12.13) that the mathematical expectation and distribution disper- 
sion of the threshold are finite. 

For the neuron with arbitrary memory m,, in the adjustment block (m,,= const, N = 1), 


a(n) , if n+l#im,, i=1,2,3,... 
_ * imy 
aaa aS {e(l)-sign|x(I)—ag(I)|} , if n=im, no 


™n [=p H 


The expression (12.15) is valid in spite of the remark made in Chap. 9 about the 
impossibility to design the analytical adjustment algorithm with arbitrary values of 
m,, in the general case with the minimum |q,,| criterion and the system with two so- 
lutions. This is explained by the fact that the expression (12.15) concerns the particular 
one-dimensional case (N = 1) with x) = -1= const. The coefficient adjustment in (12.15) 
is performed after each m,, cycle of patterns entering the system input. 

The expression for the transfer probability in the given Markovian chain is similar 
to the case m,,= 1: 


Plx,(n)=—2|= s|-4 (mx")| 


Pix,(n)=2| = 5 @,(mx’) 


1 * * 
P\x,(n)=0 =-|I-@ (mx )+@ (mk | 
| g! ) 2 2 1 
where mK’ is the current value of the adjustment coefficient a). The value 
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amounts to 
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This is the case of the polynomial distribution 
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Here I, t,m,,- 1- t are respectively the number of times when + 1,-1,0 emerge in Xx 


After the change of variables € = 1 - t, one can obtain the constraints upon the variable 
range in the following form: 
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The expression for the transition probability in the case € > 0 has the following form: 


My —S 
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My —2t+E 


x ®,(m,°)*[1-+.4(m,K")—&,(m,") 


The expression for the transition probability in the case € <0 has the same form 
with substitution of the lower limit by (-€). The expression for the transition probabil- 
ity will be unified when the lower limit is taken as {max[0, -&]}. 

The stochastic difference equation for the distribution density of the adjusted co- 
efficient a) corresponding to the second stage of analysis has the form 


mn 
Wim (mx*)— > Wi-tymy 


k=—my, 


(m —K)K*|P|(m _k)K’ | mK| 


where P[ ] is determined by the above expression for the transition probability. 
The recurrent relationship for the closed-loop system with m,,=1 in the multidi- 
mensional case can be written in the form 


a(n+1)=a(n)+ K'x,(n)sign x(n) 


This is the case of the walk problem across the (N + 1)-dimensional grid. Such a 
walk is described by the multidimensional Markovian chain. The analysis of the closed- 
loop system here consists in the derivation of expressions for the transition probabili- 
ties, derivation of the stochastic equation, and investigation of its solution. 

The solution of these problems is rather complicated even in the relatively simple 
case under consideration. 
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Neuron with the solution continuum and continuum of pattern classes. Let us consider 
the case N= m,,= 1. Using the criterion of minimization of the discrete error second 
moment 0,,, one obtains 


ay(n+1)=ag(n)—2K™ [et F[x(n)—ap(n)| ora (12.16) 


dg 


Here 


dF(g) 


xg (n) = e(n)—F[x(n)—ag(n)] “ay 8) 


Let us introduce the following random variables: Aj(n), Z[n], L[n], X[n], E[n], G[n], 
and Y[n]. Their possible values are a,[n], z[m], [[n], x[n], e[n], g[n], and y[n]. 
The value G[n] is the function of the random variables A,)[n] and X[n]: 


G[n] = X[n] - Ag[n] 


and Y[n] is the function of the random variable G[n]: 


Y[n] = 9(G[n]) 


The value Z[n] is determined in the following way: 


Z(n] = {E[n]— F(GIn])} p(G[n]) 


L[n]= E[n]— F(G[n]); I[n]=xg{n] 


The distribution density of Aj[n + 1] will be found in the following form: 


flagin+D)= ff flagin+1),x(n)]dx(n)= f° f[ag(n+1)/x(n)]flx(n)]dx(n) (12.17) 
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order to determine f[a)(n + 1)/x(n)], one needs to determine ®[z(n)/a)(n),x(n)]: 


[elnylag(n).xin)J= fo ff xg), y(n) /ag(n), x(n) dx, (n)dy(n) 


xg(n) y(n)<z(n) 
x(n), y(n)/-x(n),a9(n)|= fa[xg(n)/ y(n), x(n), 49 (1)] fs 7(02)/ x(0), ag (1) 
y(n) x(n),a9(n)] = 5{ y(n) — g[x(n) —ag(n)]} 


[_()/ y(n), x(72),a9 (12) = fo | xq) x(2) 49 (1) 


because the random value ¢ is the certain function of the random values of X and A): 
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fal xq(n)/ x(n) a9 (n)] = fas {x19(m2) + F[x(2) —ag(n)]/ x(n)} 
fi,[e(n)/ x(n), a9(n)]= fy [e(n)/x(n)] 

Consequently, 

falxg(n)/ x(n), a9(n)| = far {x,g(n) + F[x(n) —ag(n)]/x(n)} 


where f, is a new function with fixed a)(n). 
As a result, one obtains 


filxg(), y(n)/x(n),ag(n)|= fa {xg (n) + F[x(2) —ag(n)]/x(n)} d[ y(n) — Cg (n))] 
Let us define 
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Then the distribution density of L(n) with respect to X(m) and A)(n) is 
(6) 
ee = Bg Flee) dolor), 


Jy el + F[x(n) an y|x(n) —ag(n)]} dy(n) 


_ =S i | yt) ae + F[x(n)—ag(n) yo] 00) g|x(n)— ay(n)]} dy(n) 


1 * z(n) 
“lp [x(n)— ay(n)|)"* g|x(n)— ag(n n)| 


+ F[x(n) etna} 
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The cumulative distribution of the random value A)(m + 1) with respect to X(+) has 
the following form: 


Plag(n+1)/x(n)|= | [ P[z(n)/a9(n),x(n)] f[ag(n)/ x(n) dag de(n) 


Since 


Ff [ao (n)/ x(n) = f [ag (n)| 


then 


oo oo 


PlagintVixm|= ff fulao(n)] 


—co_ 4 (n+1)—ag(n) 


2K 


a: ay(n)| I 


{Ae +F (x(n) — aq(n)| / x} day (n)dz(n) 


Consequently, 


fru [ay(n+1)/x(n)|= P\aq(n+1)/x(n)| 


oO 
Oag(n+1) 


oe, a(n-H1)—a(n) : 
a Jf [a9(n)| )] lal rae ap (n)] a(t 2K" glx(n ae a0) ay(n) 


One obtains finally 


Jrssloot Mae SS Tefam—aen] ae) 


ee agltee Aol) 
—2K | x(n)—ay(n)| 


(12.18) 
| F[x(n) wis aon 


In the limiting case at n > «, 


oo agoe 


—2K"gx—€) 
This is a ee Fredholm integral equation of the second kind. It can be 


solved numerically in the general case. A non-negative function is integrated in the 
expression for f,,,,;[49(” + 1)]. Then f,,,;[a@9( + 1)]20. It is evident that at n =0 


fi(40)= + F(x—€),x|dxdé 


2K" 


ii fo [ay(0)| dag (0) = f 65 |a9(0)— Aq |day |0 j=1 


—oo —oCo 
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where dy is some given threshold value. 
Let us assume that for f,,[a)(1)] 


J Fnla0(0)|dag(n) =1 


—0o 


Let us show that in this case 


f Ffas1[4o(n +1)|dag(n+1)=1 


—CoO 


cow C 


= J Ina [a9(n+1)] Jdag(n += [ f i 


—00 —0O—-0O 


(n) 
al ae ay n [40 (n)| 


ag(n+1)—ag(n) 
—2K g[x(n) — ag(n)| 


+ F[x(n)—ag(n)|/ x(n); dx(2)dag(n)dag(n +1) 


Let us make the change of variables 


ag(n+1)—ap(n) 


€(n)= " + F[x(n)—ag(n)| 
—2K g|x(n) —ag(n)| 

de(n) = Bela, 
2K | x(n) —ag(n)| 

Then 


= i f J fa [4o(n)] f [el(n)/-x(n)|de(n) dag (n)dx(n) 


= J falao(n)| J ey f [e(n)/x(n)|de(n)dx(n)| dag(n) 


Due to the distribution density properties, 


f J fle) /x(n)]de(n)dx(n)| dag(n) 


—0O—-0CO 


a f Sn [a9 (n)| 


and due to the assumption 


a J fn {4o(n)|day(n) =1 


—0oo 
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Consequently, 


y°= J fasrlao(n+1)]dag(n+1)=1 


—oo 


that was required to be proven. 

The similar expressions can be obtained for the cases m,, = const and N#1, as well 
as for the cases of more complex multilayer neural network structures. But the com- 
plexity of the obtained expressions sharply increases. The analysis of such expres- 
sions in the explicit form has no sense. It is necessary to perform the transformation 
to the distribution of the correct recognition probability by means of the adjustment 
coefficient space integration. This is a rather complex task. One can only write gen- 
eral expressions for the mathematical expectation and variance of the average risk 
function: 


MR=[ f"(@)| f [ felOf/e)I| x, =PC),€|dedx da 
A 


EX 


DR= { f**(a)[R-MRJ da 
A 


The aforementioned analytical investigation complexity of the closed-loop sys- 
tems with the fixed structure results in the requirement to use statistical modeling 


methods. 
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Chapter 13 


Synthesis of Multilayer Neural Networks 
with Flexible Structure 


The design of the multilayer neural networks with fixed structure and closed-cycle 
adjustment does not require some a priori information about the input signal, as op- 
posed to the case of the open-cycle adjustment. However, the probability of correct 
recognition is restricted in the former case by the neural network structure fixation. 
This chapter deals with the neural network synthesis with flexible structure (Fig. 13.1) 
that is selected in the adjustment process. 

Function y(x) in Fig. 13.1 is the structure of the neural network open-loop part. 
Methods of adjustment for the multilayer neural network with flexible structure se- 
lected on the basis of a given probability of correct recognition include the successive 
neuron layer learning. 


13.1 
Sequential Learning Algorithm for the First Neuron Layer 
of the Multilayer Neural Network 


Sequential learning algorithms for the first layer of the multilayer neural network are 
based on the gradual increase of the hyperplane number. These hyperplanes form the 
resultant hypersurface up to the achievement of the required recognition quality or 
some other condition for the learning process termination. The learning process is 
reduced to the formation of the logical tree. Geometric interpretation is the following. 
The feature space is optimally divided into two parts by some fixed structure neural 
network. Then the obtained subspaces are divided again, and so on. 


Fig. 13.1. 

Block diagram of the neural 
network with flexible struc- 

ture and closed-cycle adjust- 
ment 


y(n) 


Neural 
network 


Adjustment 
block 
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Figures 13.2-13.4 show respectively the general block diagram of the sequential al- 
gorithm for the design of the piecewise linear divisional surface realized by the flexible 
structure neural network and a logical tree describing the design of the piecewise lin- 
ear divisional surface. In Fig. 13.2: J- unit of the fixed structure neural network pa- 
rameter calculation; IJ - unit of the input learning sample partition; and III - adjust- 
ment algorithm of the flexible structure neural network at the first step. The resultant 
border between two classes is shown in Fig. 13.4 by the double line. The first hyper- 
plane ~,(x) divides the feature space ®, into two sub-regions ®, (first class patterns) 
and @, (second class patterns). The learning sample Ly is divided into two ones: 
L, (vectors from ®,) and L, (vectors from ®,). The numbers of incorrectly classified 
patterns in each of the sub-regions are 9, and @,. The maximal element from the set 
{0,, 8,} is selected, and the corresponding sub-region is further divided. Let us assume 


Fig. 13.2. 

Block diagram of the sequen- 
tial algorithm for the design of 
the piecewise linear divisional 
surface 


Fig. 13.3. 
Drawing of the piecewise lin- 
ear divisional surface 


13.1 - Sequential Learning Algorithm for the First Neuron Layer of the Multilayer Neural Network 
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that 0, > 0,. Then regions ®,, and @,, are obtained after partitioning of ®, by the hy- 
perplane. Then 6,, and @,, are calculated and the recognition errors are compared. If 
6,>6,,+ 0,,, then the introduction of the new hyperplane improves the recognition 
quality. In this case, the sample L, is divided into sub-samples L,, and L,,. Then the pro- 
cess is repeated. As a result, one obtains the set of regions ®;, D;,,..., Dj ;,,._,» where 
indexes i, j, k,..., t take the values 1 and 2. If the drawing of the hyperplane in the sub- 
region ®;,,_, does not improve the recognition quality, then the partitioning of the 
obtained regions must be continued. The number of steps decreasing the recognition 


O(@p) 1(@4) 


121 122 221 222 8(1) 9(2) 10(1) 11(2) 
a b 


Fig. 13.4. Logical tree: a scheme of the drawing of the piecewise linear divisional surface shown in 
Fig. 13.2; b sequential numeration of the tree knots 


Fig. 13.5. re 
The block diagram of the pro- : 
gram realizing the algorithm Selection of patterns for 


of the piecewise linear divi- the learning sample 


sional surface drawing 


Logical tree 


Drawing of a new 
divisional surface 


Quality improvement 
estimation 


v 


Drawing of a new branch 
of the logical tree 


Termination condition 
estimation 


v 


Stop 
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quality must be limited in the design of the similar algorithms [13-1]. If the error does not 
decrease at the given number of steps, then the initial region ®;,, |, is excluded. The 
following rules for the algorithm termination are considered in [13-1]: (1) Termination at 
the given value of the error probability; (2) Termination at the given value of the hyper- 
plane number (the number of neurons in the first layer). The block diagram of the pro- 
gram realizing the algorithm of the piecewise linear divisional surface drawing is shown 
in Fig. 13.5. Let us describe the operators that are not clear from the previous description. 


Operator “Logical tree”. As it is seen in Fig. 13.4, the tops of the tree are of two types: 
intermediate tops and terminations. The tree root is the knot with index 0, and termina- 
tions correspond to certain pattern classes. Each pattern x after the pass of the operator 
“Logical tree” enters one of the terminations. Function 9; ;,,__,(x) is used for making a 
decision about further movement direction from the top i, j, k,..., t.1f Oj; 4.) 20 then 
the movement is to the right branch and vice versa. The logical tree in Fig. 13.4a takes the 
form shown in Fig. 13.4b under the sequential numeration of the tree tops. The logical 
tree is convenient for describing the following with the three-column matrix: 


Ord 3 
04 5 
06 7 
10 0 
08 9 
C=}|1 0 0 
0 10 11 
1 0 0 
2 0 0 
10 0 
20 0 


The s-th row of matrix C corresponds to the s-th top of the logical tree. The matrix 
rows, similar to the tops, are of two types. The row (0 ss + 1) describes the intermedi- 
ate top of the tree. One takes the divisional surface @,=0 corresponding to this top, 
and the transfer to the top s is performed if sign 9, = -1 or to the top s + lif sign @,= 1. 
If the row has the form (k 0 0), where k = 1,2, then it describes one of the tree termi- 
nations. If the point x; appears in such a point after the sequential use of several divi- 
sional surfaces, then it belongs to the class A,. The drawing of the new hyperplane 
9 (x) results in the appearance of two new tree branches going from the top i. Matrix 
with U rows obtains two new rows (U+ 1) and (U +2) of the following form: 


U+1:100 
U+2:200 

and the i-th row gets the record 
0U+1U+2 


ie., the i-th top now becomes the intermediate top of the logical tree. 
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Operator “Drawing of a new divisional surface” can use any neuron adjustment algo- 
rithm described in Chap. 9. It can realize any fixed structure neural network described 
in Chap. 9 and 10. 


Operator “Quality improvement estimation” is used for the recognition quality improve- 
ment estimation. The results of its application are used for the logical tree design: the 
improvement of the quality leads, for example, to the division of the region with the 
highest value of the average risk function. In the opposite case, the division of the lastly 
obtained regions takes place. 


13.2 

Learning Algorithm for the First Neuron Layer of the Multilayer 
Neural Network Using the Method of Random Search of Local and 
Global Function Extrema 


This algorithm was designed on the basis of methods described in Chap. 8. One can 
ignore in this case the tree structure design, and all the neurons providing local ex- 
trema of the average risk function are included in the first layer (Figs. 13.6, 13.7). 
Four hyperplanes in the two-dimensional feature space shown in Fig. 13.6 deter- 
mine four local extrema of the average risk function. Figures inside the circles corre- 
spond to the numbers of the logical function arguments in each region of the multi- 
dimensional feature space. Table 13.1 gives the logical function values in the example 
shown in Fig. 13.6. Tick marks correspond to the values that are not defined at the 
given argument values. Zero index corresponds to the regions without patterns. 


Fig. 13.6. 
Illustration of the learning 
method for the first neuron 
layer of the multilayer neural 
network using an algorithm of 
random search 


Fig. 13.7. 

Multi-extremum property 
of the average risk function 
under multi-modal distribu- 
tions f(x/e) 
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Table 13.1. Logical function values in the example of in Fig. 13.6 


Y; =| i fel i Jel 1 fel i) Jai il p= i pst il Jet 1 
W [al pol 1 i pa pl 1 i pet [=i 1 to pail fet fel fel 
V3 = Ji pel p= 1 ] 1 | fel fel fei fel 1 ] 1 1 
Wa Jol fel fel Jel fel fi fel f=! 1 1 1 1 1 1 1 1 
é a 2 e ~ TS 0 0 = Tl 0 0 ie 0 ] 1 o 


It is evident that deterministic search methods do not provide the local extremum 
overrunning. The solution is the introduction of the random elements into the search 
procedure. 

The main algorithm stages are the following: 


a The adjustment coefficient vector components of the current neuron are randomly 
selected; 

b_ The next local extremum of the average risk function is found by some neuron learn- 
ing method; 

c This extremum is recorded if it was not found earlier. 


The transfer to the first stage is performed after the termination of the third one, 
and the adjustment coefficient vector of the next first layer neuron is determined. 

The experimental investigation of one performance cycle of such a learning algo- 
rithm for the first layer neuron was described in Chap. 12. Figure 13.8 shows the block 
diagram of the program that realizes this algorithm. 

The plan of experiments with this program was aimed at the analysis of the learn- 
ing process properties. The input signal characteristics and the adjustment algorithm 
were similar to those described in Sect. 12.2. The following characteristics must be 
analyzed: 


1. The experimental estimation of the random procedure convergence; 

2. Dependence of the total calculating time on the feature space dimensionality N, 
number of extrema U, and step value A. The learning procedure is performed until 
the sequential random initial condition injection results in the search of all the quality 
functional local extrema under the given modality of the distribution functions of 
the input signal. The number of the random search steps required for the search of 
all the local extrema is given in Table 13.2, where Uis the number of the desired 
extrema, and 7) is the number of the random procedure steps required for the search 
of all the Uextrema. The approximate estimations for the mathematical expecta- 
tion and variance of the step number have the form (8.16a). 


Table 13.2 and the data described in Sect. 8.6 provide the estimation of the total 
learning time under the particular modality of the input signal. This time increases 
linearly with the feature space dimensionality increase. 
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Input and initial 
data control 


: 


Generation of the initial p Subprogram of generator of 
conditions for gradient the initial conditions with 
procedure the uniform distribution 
Calculation of the Gradient calculation 
optimized function program 
argument increment ¢ 


oo 


Generator of the input 
Investigation of the vector with multi-modal 
iteration procedure distribution 
termination 


optimal vectors 


J 


Storage of the optimal 
vector in the new local 


Comparison with the 
previously found 


extremum 


Local extremums 
counter 


Print of the learning 
procedure results 


End 


Fig. 13.8. Block diagram of the program realizing learning algorithm for the first neuron layer of the 
multilayer neural network using random search methods 
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Table 13.2. 
Number of random search e Mu Mau Pau 
steps required for the search 7 1 1 S 
of all local extrema 

2 4 3 2 

3 8 6 3 

5 8 10 6 

7 23 14 8 

10 33 22 12 
Fig. 13.9. AX 
Illustration of the process for V4 
the increase of the error prob- 
ability at some step of the 
sequential algorithm perfor- 
mance: I - the first class; 
II - the second class 
xj 


13.3 
Analysis of Algorithm Convergence under the Hyperplane 
Number Increase 


The algorithm convergence under the neural network structure complication depends on 
the rules for the selection of the sub-region for partitioning and the learning algorithm 
at each partition step. The method described above of the selection of the sub-region for 
partitioning is optimal by the algorithm convergence rate. The simplified methods of the 
feature space sequential partitioning usually used in practice and consisting in the neu- 
ron open-cycle adjustment using the initial learning samples moments often result in the 
increase of the error probability at some step of the algorithm performance (Fig. 13.9). 

The non-shaded area in Fig. 13.9 corresponds to the next partition region, and the 
divisional surface is drawn perpendicular to the line linking the centers of two classes. 
The error at this partition step increased because some patterns of the first class ap- 
peared to be classified as second class patterns. To provide the stable decrease of the 
error probability, one must use the closed-cycle adjustment with the second discrete 
error distribution moments @,, minimization at each procedure step. This provides 
the minimum number of the first layer neurons. However, it is sometimes necessary to 
increase the number of the first layer neurons voluntarily at the expense of the sharp 
simplification of the neuron learning algorithm. 

The increase of the hyperplane number in the most unfavorable case results in the 
error probability estimation convergence to zero due to the finite learning sample length. 
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Fig. 13.10. 

The analysis of algorithm 
convergence under the hyper- 
plane number increase on the 
learning and recognition stages 


AP(H1) = (Pp— Po) 


“o(h4,) 


Fig. 13.11. 

Formation of the learning 
sample at the first layer neu- 
ron output 


Two stages of the neural network design therefore exist: the algorithm learning stage 
and the algorithm precision estimation stage. At the sample length equal to M, only its 
minority part M, is used for the algorithm learning. The trained algorithm is then used 
for the recognition performance on the sample part M,=M - M,, and the real algo- 
rithm precision is estimated by the recognition probability error P,(H)). Function 
AP,,(H;) = P,(H,) - Po(H,) shown in Fig. 13.10 must in principle increase with the in- 
crease of the hyperplane number due to the decrease of the algorithm capability for 
generalization. Here P)(H,) is the function of the error probability change on the neu- 
ral network learning stage. Function P,(H,) often has a local minimum under the given 
finite value of H,, amounting, for example, to Hj. It is recommended to use namely this 
number H; of hyperplanes if P,(H’) satisfies the initial conditions. 

The particular result of the first layer neuron learning in the case of the multilayer 
neural network with two solutions is the logical function that determines the sequence 
of multidimensional feature space partitioning. This logical function is sometimes not 
defined not only on the complete sets of arguments but on some separate arguments. 
The simplest illustration of underdefiniteness of such a kind of logical functions is 
represented in Fig. 13.11 and Table 13.3. The Roman numerals indicate the initial re- 
gions of the feature space for the formation of some set of the logical function €(y) 
arguments. The cells marked by the tick marks indicate variable sets that never appear 
at the output of the first neuron layer. The cells marked by the sign © indicate the 
variable values out of the complete set of 2! values that also never appear at the output 
of the first neuron layer. The procedure of the sequential partitioning shown in Fig. 13.11 
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can be illustrated by the tree and matrix with the form represented in Fig. 13.12. Here 
I-IV are the regions obtained as a result of the sequential partitioning. 

The problem of the extension of a definition of the logical function €(y) emerges in 
connection with the necessity to form the arrays of the learning vectors at the output 
of the first layer neurons required for the adjustment of the following neuron layers. 
The main problem here is to extend the logical function definition onto the partially 
given sets of its arguments. The definition extension onto set 8 (Fig. 13.11, Table 13.3) 
is not necessary because this set never appears in this particular task of the piecewise 
linear divisional surface design. The definition extension is carried out in the following 
way. Vectors with existent coordinates, initial teacher instruction, and complete sweep- 
ing across the absent variable values are recorded in the learning array for the second 
layer neurons of the multilayer neural network represented in Table 13.4. 

The logical function for the adjustment of all the neuron layers except the first one 
is formed in Table 13.4. 


Table 13.3. Underdefiniteness of a logical function 


Region number 1 2 3 4 5 6 7) 8 

€ -1 i -1 1 1 1 = = 

y First neuron =| 1 1 = =| 1 1 ie 
Secondneuron @ 1 -1 cr) 8 1 -1 - 
Third neuron =| @ @ ] 1 ® 2) - 


Region number WP pile pee pee fey pee ice ee is) IS ie IG |e 1 
é =i! J =! 1 i Jel pai 1 il Pei Jai ] 1 Jai p= 
y First neuron =| J = ] 1 1 i) f= pel f= 1 1 ] 1 1 


Second neuron -1 1 1 1 -1 -1 =-1 1 1 1 i i Jat t= 


Third neuron =| Jail [=i ijeae]i 1 1 i fed f= i el 1 
Logical tree and matrix of 
transformations for the ex- 1 oul iv 
ample represented in Fig. 13.11 
I Il 0 O 
IV 0 0 
ll Vv VI 
HHI IV V Vi v 00 
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13.4 
Learning Algorithms for the Second Layer Neurons 
of the Two-Layer Neural Network 


13.4.1 
Condition of the Logical Function €(y) Realizability Using One Neuron 


The goal of this section is to test the logical function realizability using one second layer 
neuron of the two-layer pattern recognition system. If the result of this test is negative, 
then the transfer to the synthesis of a three-layer neural network must be performed. 


Zz 
Q) Dle"v@) + @ ET) 
z=1 


Figure 13.13 is the illustration of the logical function realizability using one neuron. 
Here, the value of the neuron output analogous signal g(7) is less than zero across all 
sets of the input binary variables y(z) (z is the number of the set) of the first class and 
less than zero across all sets of the input binary variables of the second class. The value 
Ag =?) — gl) is termed “interval” [13-5]. The necessary and sufficient condition for the 
logical function realizability using one neuron can be written in the following form: 


€(z)—sign g(z) | (13.1) 


g(z)e(z) =|g(z)| 


The summation of the first members and second members of equations (13.1) gives 
the condition for the logical function realizability using one neuron in the following form: 


Zz vA 
> g(z)e(z) = >>| g(2)| (13.2) 
z=1 z=1 


or in another form 


Z| Ay 


Ay Z Z 
>54j; >) yj(ZE(z) +49) €(z)= >> > 4jyj(2) +a (13.3) 
z=1 


j=l z=l z=1]j=1 


The expressions 


Z Z 
di yj(Zelz) , lez) 
z=l z=l 


Fig. 13.13. foe ge), 
The test of the logical func- 9(x) 
tions realizability using one 0 

neuron 
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are unambiguously determined by the given logical function and can be calculated 
before the solution of the problem for the synthesis of the neuron realizing this logical 
function. Similarly to [13-5], let us introduce the expression 


Z 

bj=) yj(Z)ez) ,  j=L.-o Hy (13.4) 
z=1 

Notice that 


Z 
by =) e(z) , because Xx(z)=1; zZ=hL...,Z 
z=1 


Expressions (13.3) and (13.4) give 


Ay Z\|Ay 
a b=) (4;=>) > ajyj(z) +a 
j=l z=1|j=1 
or 
. Z 
a b=) _|g(z) (13.5) 
z=l 


The expression (13.5) gives the necessary and sufficient condition for the logical 
function €(y) realizability. 

Let the logical function €(z) determined in Z points of H,-dimensional binary ar- 
gument y(z) not be realizable using only one neuron with the weighting coefficient 
vector a. The scalar product of vector a and characteristic vector of the logical function 
is less than the sum of absolute values of the neuron output analogous signal across all 
z=1,...,Z. Consequently, the weighting coefficient vector of the neuron realizing the 
given logical function with characteristic vector b must minimize (to zeroth value) the 
following functional: 


Z 
I(a)=5-|g(z)|—a"b (13.6) 


z=1 


Vector b is to some sense close to the weighting coefficient vector a of the neuron 
realizing the logical function that corresponds to b. The difference 


1 
(2) Sqr © KC) 


can be regarded as an error of the logical function realization [13-5]. Then the mean- 
root-square error is minimal at c=b. Consequently, the corresponding vector b can 
sometimes be taken as a weighting coefficient vector a realizing the logical function. 
However, in general, vector b cannot be always taken as vector a. 
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The expression (13.5) is similar to the expression (13.2). The latter one can be repre- 
sented as a system of linear inequalities and the former one as a nonlinear equation. The use 
of (13.5) is slightly simplified because the initial logical function € (y) defined in 2"! points 
of H,-dimensional space of binary (-1, 1) variables is represented in (13.5) by an H,-di- 
mensional analogous vector, whereas in (13.2) it is represented by 2" binary numbers. 


13.4.2 
Synthesis of a Neuron by the Functional Minimization Method 


The aforementioned correspondence between (13.2) and (13.5) indicates the advantages 
of (13.5). However, the complexity of the explicit expression for the nonlinear term 


Zz 
Ds 
Z=) 


emerges here. This complexity can be overcome by the use of the appropriate approxi- 
mation. According to (13.6), the minimized functional has the following form: 


Z 
I(c,b)= |e" y(2)| 7b (13.7) 
z=1 


Here cis the arbitrary weighting coefficient vector that provides a non-zero analogous 
error of the neuron; b is the characteristic vector of the given logical function. The follow- 
ing condition is assumed at the determination of the vector providing a minimum of 
(13.7): either vector a= cis the weighting coefficient vector realizing the given logical 
function, or the given logical function cannot be realized with the use of only one neuron. 

Figure 13.14 shows conditionally the dependencies of summands in (13.7) upon c; 
for realizable and non-realizable logical functions with the use of only one neuron: 
1 - physically realizable logical function; 2 - non-realizable logical function. 


Fig. 13.14. 
General form of the quality 
functional for physically real- 
izable and non-realizable logi- 
cal function with the use of 
only one neuron 
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The described synthesis method for the second layer of the multilayer neural net- 
work is based on the following representation: 


alg(z)| = &1479°(z)] + E,lq*g*(z)] + «.. 


where q is the normalizing factor limiting the approximation region in the following 
way: 1 <q|g(z)| < 0. The approximation of q|g(z)| by k terms is the k-th order approxi- 
mation. 

In the case of k= 1, 


|g(z)| = €,149°(z)] 


Consequently, 

Z Z j 
Ys) ~ 4549 8° (2) 
z=1 z=1 


or 


Pel GaH Ys Pag ny,(0 


z=1i=0 j=0 


Ay 
Slee) =f) See) O02) ) 


i=0 j=0 z=1 


The sum 


Z 
Vi vil2)y (2) = 4 
is totally determined by the given logical function (by its argument values) and can be 
calculated before the neuron synthesis solution. The same is valid for its characteristic 
vector. The following expression is valid for the whole set of arguments of the logical 
function defined in 2"! points of H,-dimensional space of binary (-1, 1) variables: 


Z 
do yil2)y (2) = 27! (13.8) 


z=l 


where 6, is Kronecker symbol. This expression is not valid in the general case of the 
multilayer neural network synthesis. In this particular case, taking into account (13.8), 
one obtains 


Z Ay 
S |e(z)| = 6921S Sef (13.9) 
z=1 1=0 
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In the general case, 
Z TE 
> |8(2)|= Sal De} ; D=(d; (13.10) 
z=1 
If (13.9) takes place, then 
Hy Hy 
I(c,b) = 42") Nef —) eb 
1=0 1=0 


The expression for the optimal vector c providing minimum [(0) has the following form: 
c;~= P,b; 
where 


B= (Gao) 


The realizability of the logical function using one neuron is invariant with respect 
to the multiplication of a; by the constant coefficient. Then the expression for the de- 
sired weighting coefficient vector realizing the logical function with the characteristic 
vector b under condition (13.8) has the following form: 

axb; , i=0,..H, (13.11) 

It is seen therefore that at the first-order approximation under the condition (13.8), the 
weighting coefficient vector is equal to the logical function characteristic vector. If in this 
case the first-order approximation doesn’t appear to be valid, then one usually assumes 


eb: 


1 1? 


only for i=1,... Hj, 
and the value ay is varied for providing the possibility to realize the logical function 


using only one neuron (see an example below). 
In the general case, when (13.8) is not valid 


Hy 
= T 
I(c,b) = Eq(c De)—) ejb; 
I=0 
and the desired weighting coefficient vector is calculated according to the equation 
a=D"b 
This is the main expression for the neuron synthesis by means of the functional 
minimization at the first-order approximation. Matrix D"' and vector b are calculated 


by the initial values of the realized logical function. The calculation of ay is similar to 
the case of the validity of (15.8). 
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Example. Let the divisional surface configuration be the same as that which is shown 
in Fig. 13.15. 

Table 13.5 gives the values of the logical function of four variables. The tick mark 
indicates the argument values that do not participate in the formation of the initial 
piecewise linear divisional surface. The values of the binary input variables are ordered 
by the increase of the corresponding decimal numbers z. The full set of the logical 
function values is realized by the following transformation: 


E =X X34 XX yt X1XpXy 


The logical function characteristic vector is 


2H 
b; = 55 e(z)x;(z) , 1=0,...,H}; x) =1 
z=2 
Fig. 13.15. 


Illustration of the synthesis of 
the second layer neurons in 
the two-layer neural network 
by method of the functional 
minimization 


Table 13.5. Values of the logical function of four variables 
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In the considered example, by = -2, b, = 6, b, = -2, b;= 10, and b,=6. It is easy to 
check that these neuron coefficients provide the realization of the initial logical func- 
tion using this neuron. However, in the case of the first-order approximation, the ad- 
ditional variation of coefficient by is required. Using the calculated coefficients b; and 
(13.11) for i=1,..., H; one obtains the following value (Table 13.6): 


B(z)=) a; y(z) 
i=1 


The sweeping of the neuron threshold (by = ay) values is performed per unit in the 
interval [B(Z) max —- 9-5]+[B(Z) min + 0.5]. 

The method of the neuron synthesis by means of the functional minimization and 
first-order approximation at the incomplete variable set determined by the divisional 
surface form (Fig. 13.16, Table 13.7) can be illustrated in a similar way. 

Consequently, the general procedure for the neuron synthesis by the method of 
functional minimization includes the following stages (step by step with dependence 
on the logical function realizability): (1) determination of the characteristic vector b; 
(2) determination of the threshold b,; (3) the use of the second-order approximation, etc. 

It is evident that the described method of the neuron synthesis is equivalent to the 
usual neural network synthesis methods with open-cycle adjustment and input signal 
high-order moment consideration. In the described method, at the first-order approxi- 
mation, the characteristic vector is the vector of the divisional surface drawn in the 
middle between the centers of two classes. 


Table 13.6. Results with calculated coefficients b; and (13.11) 


z 0 1 2 3 4 5 6 7 8 9 OR) TTD) ZED TSS p48) |S 


Biz)m F208 (Sas F245 F125 10 i2 [= |B [8 [4 JHi2 10 12 [28 13 [2 

Table 13.7. 

Synthesis of the second layer 2 L 2 Z a 2 $ 7 

neuron yy, 1 1 1 1 1 -1 -1 1 
Yo = 1 1 1 =I 1 ll 1 
V3 = =| -1 -1 1 1 1 1 
é 1 ] 1 = al =| 1 =| 

Fig. 13.16. 


Illustration of the synthesis of 
the second layer neuron in the 
two-layer neural network at 
the nonstrictly defined logical 
function €(y) 
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13.4.3 
Neuron Synthesis by the Threshold Function Tables 


The sufficiently large attention paid to the problem of the second layer neuron synthe- 
sis in the multilayer neural network with two solutions is explained by characteristic 
peculiarities observed in the process of operating with the first-layer neuron output 
signals in the binary space. 

Neuron synthesis by the threshold function tables [13-5] is based on the use of the 
logical function characteristic vectors. This method allows one to obtain the neuron 
optimal parameters in the case when the one-neuron logical function realization is 
possible [13-5]. The method of the neuron synthesis by the threshold function tables 
can be used when the first-layer neuron number is not more than seven. The process 
of design of the characteristic vector tables and corresponding weighting coefficient 
vectors of the second-layer neuron is described in detail in [13-5]. The procedure con- 
sists in the following steps: 


1. Vector b determination; 

2. Formation of the decreasing sequence of |b;| values (i =0,..., H,) and checking its 
presence in the corresponding table. If it is absent, then the given logical function 
is not realizable by one neuron, and the synthesis procedure is terminated; 

3. If the sequence in the table is found, then the given logical function is realizable by one 
neuron. The weighting coefficient vector a can be found in the following way. Write out 
sequence |a,| related in the table to sequence |b,|. Then make the replacements and sign 
changes of a; in the precise correspondence with those made in vector b for its canoni- 
cal representation in the table. The obtained sequence of H, + 1 elements a; = (ip,..., Hy) 
represents the weighting coefficients of the neural network second-layer neuron. 


13.5 
Learning Algorithm for Neurons of the Second and Third Layers 
in the Three-Layer Neural Network 


The training of the second and third layers in the three-layer neural network in the 
case when the first layer is adjustable is equivalent to the independent learning prob- 
lem for the two-layer neural network with binary input signals. This section deals with 
two kinds of neural network design: design in the form of a threshold-disjunctive neural 
network [13-5] and in the form of two neuron layers with adjustable coefficients. 

The initial data in the case of the threshold-disjunctive neural network synthesis is 
the completely defined logical function €(y), and the synthesis is performed in the 
following order: 


1. Execution of the Kwine-McKlaski procedure over the €(y) function until all its prime 
implicants are obtained; 

2. Find all common intersections (centers of gravity) of two or more prime implicants 
and combine into wyes the prime implicants that have a common center of gravity; 

3. Find characteristic vector of each wye, and check these wyes’ realizability using one 
neuron (use any method described in the previous chapter); 
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. Find all possible sub-wyes for each wye that cannot be realized using one neuron. 


The sub-wye is defined as the wye subset that can be realized using one neuron and 
that is not a subset of any other wye; 


. Add wyes and sub-wyes from p. 3, 4 to the list with prime implicants realizable by 


one neuron and mark the sets covered by each record from this list; 


. Select the minimum number of records covering all the units of the logical function 


€(y). The linear threshold elements that realize these records constitute either the 
first layer of the threshold-disjunctive network or the tandem network [13-5] equiva- 
lent to this threshold-disjunctive network. 


The method of the sub-wye search includes the following procedures: 


. Determine all the implicants that have the intersection with the center of gravity of 


the considered wye; 


. These implicants together with the prime implicants are considered thereafter in all 


possible combinations, their characteristic vectors are calculated, and then the test 
on their one-neuron realizability is performed. 


This method appears to be rather complex when the number of prime implicants is 


large. One can use therefore another method for the sub-wye search: 


1; 


If the wye that cannot be realized by one neuron consists of the prime implicants, 
then one must consider all the groups of these prime implicants (G-1 implicants in 
each group) and test each group on its one-neuron realizability; 


. If at least one of such groups is one-neuron realizable then this wye is two-neuron 


realizable without its further partitioning; 


. If all the groups are not realizable by one neuron then the procedure must be re- 


peated but with (G-2) prime implicants in each group; 


. The described process continues until all the prime implicants are spent. The ob- 


tained one-neuron realizable groups are the desired sub-wyes. 


The synthesis procedure in the case of a nonstrictly defined logical function € (y) 


consists of the following steps: 


di; 


Extend a definition of the logical function €(y) to all the variable sets where it takes 
the arbitrary values; 


. Perform the process of the threshold-disjunctive network synthesis described above 


for the case of a completely defined logical function until it is discovered that all the 
wyes and sub-wyes are one-neuron realizable; 


. Compose the table of prime implicants with the number of rows equal to the num- 


ber of wyes, sub-wyes and prime implicants obtained at the second step of the syn- 
thesis procedure, and with the number of columns equal to the number of logical 
function €(y) sets. Thus, all the arbitrary values of €(y) are taken equal to (-1); 


. The minimal table record set covering all the units of €(y) is selected. At that point, 


the definition extension of its arbitrary values is automatically performed. Then the 
synthesis process is terminated. 
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The design of two output neural network layers in the form of a neural network with 
adjustable coefficients can be carried out on the basis of the following considerations. 

The feature space in this case is binary, and the space dimensionality is equal to the 
number of the first-layer neurons. Therefore, training of the second-layer neurons in the 
three-layer neural network can be performed using any method described in Sects. 13.1 
and 13.2. Then, after the second-layer neuron learning termination, the logical-tree struc- 
ture of the third layer can be tested on its realizability by one third-layer neuron. 


13.6 
General Methods of the Multilayer Neural Network Successive Synthesis 


The methods described above of successive adjustment of the three-layer neural network 
can be generalized for the case of multilayer neural networks in the following way: 


1. The first neuron layer of the multilayer neural network is adjusted by the initial samples. 
The number of neurons and the values of adjustment coefficients are selected; 

2. The one-neuron realizability of the obtained logical function is checked. In the 
positive case, the network synthesis is terminated; 

3. In the negative case, the second-layer neurons are trained according to p. 1. The 
number of neurons and the values of adjustment coefficients are selected; 

4. The one-neuron realizability of the obtained logical function is checked..., and so 
on similar to p. 2. 


It is simple to generalize this technique for the neural network with a solution con- 
tinuum. Here, the number of first-class and second-class patterns is preserved at the 
transfer from one layer to another. The multilayer neural network quality criterion is 
not only the correct recognition probability at the neural network output but also the 
function of this probability change during the transfer from one layer to another. 

The results of the application of the described technique for the multilayer neural 
network synthesis are the neural network layer number, the number of neurons in each 
layer, and the adjustable coefficient values. It must be mentioned that the described 
method of the multilayer neural network learning allows one to train any structure 
considered in Chap. 9 instead of one neuron at each learning step. 

The successive adjustment procedure can be simply generalized for the self-learn- 
ing mode. The optimization criterion for the next hyperplane drawing in this case is 
the criterion of the specific average risk function minimum. 


13.7 
Learning Method for the First-Layer Neurons of a Multilayer 
Neural Network with a Feature Continuum 


This section deals with the learning algorithm for the first-layer of a multilayer neural 
network with a feature continuum and the ways of its physical realization. The pecu- 
liarity of the learning process for the multilayer neural networks with a feature con- 
tinuum emerges at the first-layer neuron training. The expressions for a(i)-functions 
and coefficients a) in the simplest case have the following form: 
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a(i) = m,(i) - m,(i) 


dy => fmbaai—f midi 
J J 


If x,(in) and x,(i,n) are the sets of patterns of the first and second classes, then 
functions m,(i) and m,(i) are 


M 
mei) =—-) Cin) <2 
n=1 


The implementation of functional transformations described above can be per- 
formed using photographic methods in the case of two-dimensional i. The result of 
learning in this case are photomasks realizing functions a,(i) that model the light 
flux x(i,n), see Chap. 4, and coefficients dp. 

In the case of one-dimensional i, at the recognition of curves or electrical signals 
inside a fixed observation interval, functions a,,(i)and coefficients a) can be sufficiently 
and simply obtained by analogous facilities. 

The sequential learning technique for the neuron with a feature continuum remains 
the same as in the case of the discrete feature set. 


13.8 

Application of the Adjustment Algorithm of the Multilayer 
Neural Networks with Flexible Structure for the Problem 
of Initial Condition Selection 


Figure 13.17 shows the block diagram of the program realizing the process of sequen- 
tial design for the piecewise linear divisional surface at the initial condition selection. 

The idea of its application for the initial condition selection at the closed-cycle learn- 
ing procedure of multilayer fixed structure neural networks is discussed below. The 
fixed structure of the multilayer neural network imposes constraints upon the number 
of neurons, at least in the first layer, and the algorithm can diverge. Therefore, the use 
of statistical methods for the calculation of the error recognition probability is neces- 
sary. However, this can be ignored at the initial condition selection procedure. There is 
a possibility to improve the divisional surface position after the algorithm termination 
by means of a sequential closed-cycle adjustment of each neuron with the learning 
quality test for the whole piecewise linear surface. The closed-cycle adjustment must 
be performed 2H, times, i.e., 2H, x M iterations, where H, is the number of the first- 
layer neurons, and M is the sample length. It is useful to decrease the sub-sample size 
up to some optimal value. 

One of the methods to design such an optimal sub-sample is the deterministic se- 
lection of K (or 2K, 3K, etc.) patterns each belonging to one of the K pattern classes. In 
the case of the ultimate decrease of the sample size, one transfers to the deterministic 
or random selection of initial conditions in dependence of respective presence or 
absence of a priori information. 
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Fig. 13.17. 
Block diagram of the program Selection of sub-sample 


realizing the algorithm with 


flexible structure Vv 


13.9 


from the main sample 


-—>| Logical tree 
SSS 


Drawing of a new 
divisional surface 


v 


Quality improvement test 


v 


Design of the additional 
logical tree branch 


a, 7S 


Test of termination 
conditions 


———SSSSSS aa 
Stop 


About the Self-Learning Algorithm for Multilayer Neural Networks 
with Flexible Structure 


The described technique for the adjustment of multilayer neural networks with flex- 
ible structure can be used to solve the problem of self-learning (clusterization) when 
the random sample with multi-modal distribution without instruction for patterns 
belonging to a particular class is present at the neural network input. Then the multi- 
layer neural network is trained to recognize two pattern classes: 


" The first class represents the initial sample; 

" The second class represents an artificially generated random sample with the uni- 
form probability distribution function in the range of feature variation. The feature 
space dimension for the samples of the first and second classes are equal. 
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Chapter 14 


Informative Feature Selection in Multilayer Neural Networks 


14.1 
Statement of the Informative Feature Selection Problem 
in the Learning Mode 


The problem of the informative feature selection is an independent problem in pat- 
tern recognition theory, and it has not yet been solved up to now. The existence ap- 
proaches to this solution and description of so-called structural methods based on 
the multilayer neural network pattern recognition systems synthesis [14-1, 14-2] are 
discussed in this book. 

Three statements form the basis of the proposed method: 


1. The usual idea about the possibility of a preliminary informative feature selection 
before the stage of multilayer neural network adjustment is incorrect because the 
trained multilayer neural network already presents, explicitly or implicitly, in any 
known selection procedure; 

2. Only the primary optimization criterion accepted for the given system can serve as 
a criterion of feature informativity. Any other criteria usually introduce additional 
errors and restrict their domain of applicability; 

3. It is necessary to select multilayer neural networks of such types that are the most 
objective in the informative feature selection procedure, i.e., that provide the op- 
timal solution in the sufficiently wide variation range of the multilayer neural net- 
work input signal characteristics (number of classes, distribution complexity in- 
side the classes). 


The problem of the informative feature selection was initially stated as the problem 
of selection of N, = const. features out of N initial features. These N, selected features 
are supposed to provide minimum recognition probability error. This problem state- 
ment can be interpreted in another form: selection of minimum feature number N, 


Fig. 14.1. 

Selection of informative Po P, P> 

features in the initial feature N 
space 0.95 | 0.9 | 0.85 14 
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Fig. 14.2. Classification of methods for the informative feature selection 


providing a given probability of correct recognition. Let us determine in this case the 
feature informativity criterion. Suppose that NN), NN, and NN, (Neural Network) with 
corresponding numbers of features N = N, + N,, N, and N, (Fig. 14.1) provide the prob- 
abilities of correct recognition P, P; and P,. If P, > P, then the group of N, features is 
more informative than the group of N, features. In this case, if the increment AP = P - P, 
of the correct recognition probability is sufficiently enough to cover the expenses re- 
lated to the system complexity increase due to the addition of N, features, then the use 
of the group of N, features is useful. 

Such a problem statement for the informative feature selection is used in a wide 
range of practical tasks. For example, in some particular task of feature informativity 
estimation; the analysis of correct recognition probability P..., can be performed for four 
feature groups: (X15... Xpy)s((Xys «+69 Xy) A X))s ((Xy5 0005 XQ) A x;),and (, soap Ky) OV (Xp x;)). 
We considered that such a particular problem as the selection of N, features out of N 
ones for the achievement of maximum correct recognition probability cannot be solved 
without a solution in the statements described above. 

In the particular case of a multilayer neural network with full connections, the 
problem consists in the minimization of the number of threshold elements in each 
layer. Further, the minimization criterion described above is also valid. Both aforemen- 
tioned statements for the problem of informative feature selection are combined in the 
general structural approach to the this problem when the first layer is considered to be 
a priori organized in the form shown in Fig. 14.1. 

Figure 14.2 shows the block diagram illustrating methods of informative feature 
selection in connection with the considered above problem statements and informa- 
tive feature selection criteria. 


14.2 - About Structural Methods for the Informative Feature Selection in the Multilayer Networks 


This block diagram represents only the main ways of the considered solution, and 
it does not pretend to be complete, but it is only aimed at the introduction of the struc- 
tural methods for the informative feature selection. The main development relates to 
divergence, conditional entropy, and some of their simplified estimations. They also 
include the approaches based on the component analysis and analysis of variance. 

The main goal of the present chapter is the consideration of structural methods for 
the informative feature selection. They are based on the feature informativity estima- 
tion by the results of the multilayer neural network adjustment. At the solution of the 
adjusted multilayer neural network structure minimization, the minimization method 
depends on the adjustment method. 


14.2 
About Structural Methods for the Informative Feature Selection 
in the Multilayer Neural Networks with Fixed Structure 


The structural methods of the informative feature selection are based on the initial 
feature space informativity estimation by parameters and structure of the optimally 
adjusted multilayer neural network. This section illustrates the structural methods for 
the informativity estimation in the example of a neuron. 

Let us consider the multilayer neural networks of one neuron type or a neuron with 
a layer of nonlinear or random-nonlinear transformations (Chap. 1 and 2). The mul- 
tilayer neural network in the form of one neuron is optimal for the pattern assem- 
blages with normal multidimensional distribution laws and equal covariance matrices. 
In the case of the unit (to an accuracy of constant multiplier) covariance matrices, the 
level of class intersection by each of the features is determined by the corresponding 
inclination of the optimal linear divisional surface (Fig. 14.3). 

The circles in Fig. 14.3 indicate the isolines of the densities f\(x) and f,(x). If the 
main feature informativity criterion is the correct recognition probability as it was 
considered above, then it is easy to show that the i-th coefficient of the optimal divi- 
sional surface can serve as the relative estimation of the i-th feature informativity. 


Fig. 14.3. hx 
The proof of the possibility to 
use the neuron coefficients as 
the feature informativity esti- 
mations 
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The coefficients of the optimal neuron can also serve as the estimation of the fea- 
ture informativity in the case of abnormal distributions. But it can be done only on the 
level of such an open-loop structure as a neuron. In the case of abnormal distributions 
and a nonlinear multilayer neural network, the neuron coefficients in the optimal 
nonlinear network represent the complex feature informativity estimations determined 
by the nonlinear transformation layer. A similar conclusion can be made for the three- 
layer Rosenblatt perceptron. 

The structure minimization at the multilayer fixed structure neural network adjust- 
ment algorithm and the set of adjustment stages with random selection of initial con- 
ditions represent a separate problem. It includes the necessity to average the results of 
the adjustment procedure across the set of random initial condition injections. These 
injections are required for the local optimal adjustable coefficient search. The com- 
parison of the minimized structures and the local optimal average risk function pro- 
vides a direct rule for minimization of the number of neurons in the fixed structure 
multilayer neural network adjusted by the closed cycle. 

It is necessary to consider distinctly the problem of minimization of the neuron 
number in the independent learning procedure for each neuron with the random ini- 
tial condition selection separate for each neuron. After the independent training ter- 
mination for H, neurons of the first layer, and, as a result, search of the local optimi- 
zation functional extremum, the problem of selection by the adjustment results for one 
of the H, neurons that provides the optimization functional extremum value becomes 
trivial. The problem of selection of HP < H, neurons that provide the optimization 
functional extremum is rather complicated and maybe unsolvable in such a statement. 
This is illustrated by the example shown in Fig. 14.4. Here the error probability value 
is indicated in percents for each threshold selection. The numeric characters near the 
arrows indicate the class number. 


Fig. 14.4. f(x) f(x) 
Example of minimization of 
the first-layer neuron number 
in the multilayer neural net- 
work: 1 - the first class; 


2- the second class ae i es ee 


18% 
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Fig. 14.5. 

Illustration of the local 
optimality property of the 
informative feature selection 
procedure: 1 - the first class; 
2 -the second class 


0 


This approach reveals its limitations in its generalization for the case of unknown 
and complicated form of distribution f’(x/e). However, this limitation is completely 
explained taking into account the impossibility to select the informative features be- 
fore the adjustment stage termination. Figure 14.5 illustrates this property in the par- 
ticular example. The isolines of f’(x/e) in the multi-modal case and four positions of 
the piecewise linear divisional surface providing the local extremum P,,,, are shown 
here. Consequently, any informativity estimation at the fixed structure of the open-loop 
multilayer neural network is not only subjective but is also local because the adjusted 
multilayer neural network with fixed structure provides only a local optimization func- 
tional extremum. These arguments are also valid in the case of the self-learning mode. 


14.3 
Selection of the Initial Space Informative Features Using Multilayer Neural 
Networks with Sequential Algorithms of the First-Layer Neuron Adjustment 


The main problem consists in the possibility to estimate the relative value of the cor- 

rect recognition probability by the trained neural network structure form and the results 

of learning. Two feature groups are compared. Several methods of feature informativity 
can be proposed. 

1. Let us assume that the given value of P.,,, = const. is provided (in particular, P..,,= 1) 
using multilayer neural networks with sequential learning algorithms of the first- 
layer neurons and some finite learning sample. Then, if the first neural network 
with characteristics N,, P}.o,, has a higher number of neurons than the second neu- 
ral network with characteristics N,, Pyore = Picorpthen the group with N, features is 
less informative as compared with the group of N, features. This method of estima- 
tion is valid only under some conditions described below. 

2. Let us assume that the minimal recognition error is provided at each step of the first 
layer learning procedure. The learning results are represented in Fig. 14.6a by the 

curves of P.,,,. vs. the number of the first-layer neurons H, on the feature samples 

N, (NN,) and N, (NN,). It is seen that feature group N, is less informative than N,. 

This method includes that which is described in p. 1 as its particular case. 
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Fig. 14.6. Selection of informative features using a multilayer neural network with flexible structure 


3. Dependence P.,,,(H,) has the form shown in Fig. 14.6b when the learning sample is 

sufficiently large. The curve P.,,,(H;) is close to its asymptote. This means the trans- 

fer from the statistical mode to the deterministic one. The informativity estimation 

is reduced to the comparison of the steady values of P.,,,(H;). 

4. Fig. 14.6b,c shows the general case of a non-optimal adjustment algorithm for the 
first-layer of neurons of the multilayer neural network. The informativity estima- 
tion in this case is performed either according to p. 3 or at any H, but with the re- 
mark that this estimation is valid for the given adjustment algorithm and the given 
number of the first-layer neurons. 

5. It was assumed above that the learning sample is the same as the initial sample. In 
order to take into account the case when the representation of the learning sample 
in the initial sample is smaller, one must perform the learning procedure on some 
part AM; of the initial sample. The recognition by the trained multilayer neural 
network is carried out on the full sample. The analysis of the learning P,,,,,,(H,, AM;) 

and recognition P,..9g,(H;) results that are illustrated by Fig. 14.6d allow one to es- 

timate the stationary properties and representativity of the learning sample, as well 


as informativity of different feature groups. 


14.4 
Neuron Number Minimization 


The adjustment sequence process for the first-layer neurons of the multilayer neural 
network (Chap. 13) is described by the graph in the form of the logical tree. Each top 


of the tree corresponds to a neuron and an increment P.,,, that takes place after the 
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introduction of this neuron. Such a graph represents the initial information for the 
aforementioned minimization procedure. This graph can be minimized according to 
one of the following statements: to minimize the number of tops under the given P.,,., 
or to provide maximal P.,,, under the given number of tops. 

Figure 14.7 represents an illustration of the initial information for the graph mini- 
mization. The top number of the initial graph is indicated at the left part of the circle. 
The top number of the resultant minimized graph is indicated at the right part of the 
circle. The number of each graph rib coincides with the number of the divided region 
(Chap. 13). The sub-region containing the maximum number of vectors of the first and 
second classes is selected for the next division. The dashed lines correspond to the sub- 
regions with a relatively small number of vectors. The corresponding increment P.,,, 
(either positive or negative) is written near each graph top in the square brackets. The 
logical tree optimization is performed in the following way: 


1. The increments P.,,. are compared in the case of the first branching (neurons 3 and 
8 in the initial graph). The neuron with maximum P,,,, is selected for the optimized 
graph (neuron 3 in this case); 

2. Then the neurons of the given and the next branchings are compared by AP.,,.. (neu- 
rons 8 and 4) and again the neuron with maximum P.,,, is included into the opti- 
mized graph; 

3. The process continues until the sum of correct recognition probability achieves some 
given value P.,,, or the number of tops achieves some given value. 


corr 


corr 


The described procedure results in the optimal tree traversal as it is shown in Fig. 14.7a 
in the circles (tops) from the right. Figure 14.7b presents the result of optimization of the 
graph in Fig. 14.7a for two criteria: P.,., > 0.7 and P.,,, > 0.73. The course of tree traversal 
in the optimal graphic design does not coincide with that on the learning stage. 


Qa" C9) 
1 
0.143 
cae @ 
.3 
@ 
« 


@ 


Vv 
Peorr > 0.7 


Pror > 0.73 


b 


Fig. 14.7. Minimization of the first-layer neurons of a multilayer neural network with flexible structure 
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The idea of the use of sequential algorithms for the second-layer neuron learning 
consists in the use of sequential algorithms for each learning vector with consideration 
of the weight determined by the P.,. in the sub-region corresponding to this vector. 
The principle of the number minimization for the second-layer neurons as well as for 
the neurons of the following layers is the same in this case as for the first-layer neurons. 
Notice that the significance of the neuron number minimization decreases with the 
increase of the layer number due to the specific properties of the open-loop multilayer 
neural network structure (decrease of the neuron number due to the data compression 
from the first layer to the output). 


14.5 
About the Informative Feature Selection for Multilayer Neural Networks 
in the Self-Learning Mode 


All the problem statements for the informative feature selection described in Sect. 14.1 
are valid in the self-learning mode. Only the informative feature selection criterion is 
modified. In the learning mode, such a criterion is the value of the average risk func- 
tion (in particular, the correct recognition probability) whereas in the self-learning 
mode this criterion is the value of the special average risk function. The methods of 
the informative feature selection described in Sect. 14.3 concerning the learning mode 
and recognition systems with flexible structure, as well as concerning corresponding 
methods of the network structure minimization can be methodologically generalized 
in a relatively simple manner for the case of the self-learning mode. The structure 
minimization for the recognition system with fixed structure must be performed by 
means of the analysis of the adjusted multilayer neural network structure as well as 
the analysis of the obtained value of the special average risk function. 
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Neural Network Reliability 


15.1 
Methods for the Neural Network Functional Reliability Investigation 


The first attempts to estimate the neural network functional reliability were experi- 
mental [15-1] or qualitative [15-2]. The qualitative estimations showed that the neu- 
ron-like elements are characterized by the logical redundancy [15-2, 15-3], i.e., the 
failures of some elements do not result in the errors at the neural network output. 

The attempts of the analytical neural network reliability investigation face math- 
ematical difficulties. Some works [15-4 to 15-6] claim the impossibility of the full 
analytical neural network investigation. Several particular neural networks were con- 
sidered in these studies and analyzed on the basis of the Markovian process theory by 
means of graphic design. The disadvantage of such an approach is related to the pres- 
ence of the system of differential equations that is very complicated for the explicit 
solution even in the case of the simplest graphs. 

The work [15-7] deals with the neural network reliability in the sense of its logical 
stability. The logical stability is investigated with the help of stability maps that can be 
written in the explicit form only for the simplest neural networks such as the threshold 
element triplet. This approach cannot be used in practice. 

In the study [15-7], some empirical expressions for the specific neural networks 
with several constraints on their complexity were derived. But the calculations per- 
formed according to these expressions cannot simulate the objective estimation of the 
reliability functional for the considered threshold element networks. 

The attempt to investigate the reliability of one neuron taking into account that its 
weighting coefficients and input values are random is performed in [15-9 to 15-12]. 
The authors failed to obtain an analytical result even in the simplest considered case. 
These studies are characterized by several disadvantages: all the analytical calculations 
were based on the experimental data and extrapolated neuron probabilistic relay func- 
tion (PRF); some intermediate results, for example, the PRF mathematical expectation 
in [15-7], were obtained by means of additional simplifications because of their math- 
ematical complexity; the final expression can be integrated only using numerical 
methods. However, we consider this investigation as the most successful analytical 
investigation of the neuron reliability because it takes into account the neuron func- 
tional structure and the probabilistic model of its functioning. 

Thus, one can conclude that analytical investigation of the multilayer neural net- 
work reliability must be based on the principally new approach, or this reliability must 
be studied experimentally by means of Monte Carlo methods. 
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The reliability of the so-called generalized threshold element is analytically inves- 
tigated in [15-13]. The originality of such an approach relates to the fact that any combi- 
national network of neurons (multilayer neural network with sequential or cross [15-14] 
connection) can be realized in the form of a neuron layer. Since the functional reliabil- 
ity of the neuron layer can sometimes be reduced to the reliability of one neuron, then 
the analysis investigation in this case is principally significant. 

The experimental investigation of reliability can be performed on three levels: cir- 
cuit, functional, and logical. Functional and logical levels are considered in [15-4, 15-5]. 
The neuron failures are usually divided [15-15] into two classes: parametrical and 
catastrophic failures. The parametrical failures are caused by the gradual changes of 
weighting coefficients and threshold under the influence of the external factors (sup- 
ply voltage or temperature changes, components aging, etc.). The catastrophic failures 
are caused by disconnection faults or short-circuit failures. 

Experimental methods of the parametrical failure investigation are based on the 
assumption that the weighting coefficients and thresholds of all the neurons are the 
random values with normal probability distributions. The distribution parameters are 
considered to be known and to be modeled by the Monte Carlo method. Such an in- 
vestigation technique allows one not only to analyze a wide class of multilayer neural 
networks and make resumptive conclusions, but also to estimate the parametrical re- 
liability of particular implementations. 

It is clear that the multilayer neural network’s functional reliability investigation is 
not complete without catastrophic failure analysis. A special technique for such analy- 
sis was developed and used to study failures of the logical constant type at the neuron 
input-output (input-output stuck-at faults). The analysis is based on the successive 
modeling of stuck-at faults of all types for each neuron and the following calculation 
of the failure-free performance probability. Such an approach allows one to reveal “poten- 
tially dangerous” failures that result in the drastic decrease of the failure-free performance 
as compared with the other failures. The obtained results can be used for eliminating the 
possibility of “potentially dangerous” failures in the design planning stage. 

The developed experimental technique for reliability investigations allows one to 
obtain some quantitative characteristics of multilayer neural network reliability and 
provides the practical possibility to perform reliability investigations of concrete neu- 
ral network implementations. 


15.2 
Investigation of Functional Reliability of Restoring Organs 
Implemented in the Form of Multilayer Neural Networks 


The problem of reliability of digital devices implemented on any basis and particu- 
larly on the basis of multilayer neural networks is of great interest at this time due to 
the complication of computers and functions performed on their base. This problem 
is of special significance for computers functioning in a nonrestorable mode (for ex- 
ample, on-board computers), i.e., without access to reparation performance. 

After the foundational work of von Neumann [15-16], the synthesis of reliable digi- 
tal devices made of unreliable elements was investigated by different researchers. The 
works [15-17 to 15-21] must be specially mentioned in this connection. The proposed 
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Redundancy type 


Hardware Temporal Informational 


- Static Software 

= Dynamic Error-correcting 
codes 

- Hybrid 


Fig. 15.1. Classification of methods for redundancy introduction 


methods are based on the introduction of some logical redundancy into the digital 
device construction. The redundancy can be classified [15-22] as hardware (structural, 
[15-23]), temporal and informational. Such a classification is conditional because the 
redundancy of any of these types is usually accompanied by redundancies of other 
types. The classification represented in Fig. 15.1 is carried out according to the redun- 
dancy properties to increase the system reliability. 

The hardware redundancy can be used at any functional level beginning from com- 
ponents up to the whole system. Three types of hardware redundancy can be imple- 
mented according to the activity of the main and redundant components [15-24]: static 
[15-25], dynamic [15-4], and hybrid [15-2, 15-26, 15-27]. 

In the case of static redundancy, all the components, main and redundant, are func- 
tioning. Overcoming the failure effects is performed automatically by the error correc- 
tion at the expense of redundancy in the system components. 

In the case of dynamic redundancy, the redundant devices start functioning only under 
the requirement to substitute the failure units. This redundancy provides the system self- 
restoration. It requires the use of testing and diagnostic methods for failure detection. 

The hybrid redundancy is the combination of static and dynamic redundancies. 
Some duplicated devices are permanently functioning. The failure of one of them re- 
sults in its substitution by the redundant device. 

The following main peculiarities of the static redundancy are usually distinguished: 


= Error correction without interruption of functioning; 

" Correction of errors that occurs as a result of permanent failures as well as short- 
duration failures; 

= Significant increase of the failure-free performance probability of the low-reliable 
devices at the expense of the low redundancy level; 

= Significant advantages of the static redundancy are the universality and the absence of 
necessity to develop special software for detection, localization and correction of errors. 
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The scheme of the majority redundancy [15-17, 15-28] is often used for the static 
redundancy design. It implies the n-fold duplication of components or units, and the 
outputs of backup units are loaded by the restoring organ [15-17, 15-29]. The restoring 
organ realizes the following decision rule in the case of majority voting: its output is 
equal to the value that is accepted by the majority of the restoring organ inputs. 

The dependence of the restoring organ free-failure functioning probability upon 
the type of decision rule is investigated. Different restoring organ schemes implemented 
in the form of multilayer neural networks are considered. Accessible regions of param- 
eter variation for the restoration of organ optimal functioning are discussed. 


15.3 
Investigation of Multilayer Neural Network's Functional Reliability 


Methods of investigation of multilayer neural network’s functional reliability can be 
classified as analytical and experimental ones. Analytical investigation at the level of 
the neuron deals with mathematical complexity. Consequently, the main attention is 
paid to the experimental investigation of multilayer neural network reliability (see Sect. 15.4 
below). The main results described below are represented in [15-30, 15-33 to 15-36]. 

The multilayer neural network class with binary inputs is considered. The investigation 
of multilayer neural network’s functional reliability is based on the following settings: 


a Functional reliability criterion; 
b_ Probabilistic model of the neural network functioning; 
c Set of the input values. 


The correct multilayer neural network functioning probability and the output signal 
probability distribution function are considered below as a functional reliability criterion. 

The probabilistic model of the neural network functioning depends on the physical 
essence of the considered failure types (parametrical, catastrophic). For example, in 
Sect. 15.4, parametric failures are considered, and the weighting coefficients and thresh- 
olds of all the neural network elements are assumed to be random. 

It is useful to divide the experimental investigation into several stages according to 
the number of neuron failure classes. The neuron failures are usually divided into two 
classes [15-15, 15-37]: parametric and catastrophic ones. The experimental investiga- 
tion is therefore divided into two stages: the investigation of reliability for parametrical 
failures (parametrical reliability) and the investigation of catastrophic failures (cata- 
strophic reliability). 

The experimental technique was developed for the class of neural networks with se- 
quential connections. It allows one to analyze the neural networks with an arbitrary num- 
ber of inputs, arbitrary number of neurons in the layers, and arbitrary number of layers. 

The parametrical failures of neurons [15-15] include the errors at the neuron outputs 
caused by the gradual changes of weighting coefficients and threshold under the influ- 
ence of the external physical factors: temperature changes, supply voltage changes, etc. 

Experimental methods of the parametrical failure investigation using the Monte 
Carlo method include the normal probability distribution modeling for weighting 
coefficients and thresholds. 
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The catastrophic failures include the failures caused by disconnection faults or short- 
circuit failures. The failures of such a type can be reduced to the failures of the logical 
constant type (const. = 0 and const. = 1) at the neuron input-output [15-37]. It is as- 
sumed that the neuron failures are random, independent, and equally probable. The 
deterministic choice of the failure type and the number of the failure neuron is per- 
formed at the catastrophic reliability investigation. The considered approach allows 
one to reveal “potentially dangerous” failures that result in the drastic decrease (with 
respect to some a priori given value) of the failure-free performance. 

The obtained concrete values of the correct recognition probabilities can be used 
for eliminating the possibility of “potentially dangerous” failures in the design plan- 
ning stage. The advantage of the developed experimental technique for reliability in- 
vestigations consists in the fact that it allows one not only to analyze a wide class of 
multilayer neural networks and to make resumptive conclusions, but also to estimate 
the reliability of particular implementations. 


15.4 
Investigation of the Neural Network’s Parametrical Reliability 


Several logical function implementations and two- and three- layer neural networks 
with a different number of neurons in the first layer were considered in the investiga- 
tion of the neural network’s parametrical reliability. The following objectives were 
pursued in the experimental study: 


1. To analyze parametrical reliability, i.e., the correct probability dependence on 
variance D[a] of the weighting coefficients and thresholds, at different fixed shifts 
of mathematical expectation —Aa for different one-neuron realizations of some logi- 
cal function; to find an optimal realization on the base of this analysis; 

2. To analyze the change of correct recognition probability with the change of vari- 
ance of the weighting coefficients and thresholds with dependence on: 


a The increase of the first-layer neurons in the two-layer neural networks; 

b_ The increase of the first-layer neurons in the three-layer neural networks; 

c Transfer from the two-layer to the three-layer neural network realizing the same 
logical function and fixed number of the first-layer neurons; 

d Transfer from the two-layer to the three-layer neural network with the same total 
number of neurons. 


The stages of the performed investigations are described below. 


Stage 1. The optimal realization selection by the maximum parametrical reliability 
criterion is shown in the example of three different majority votes. Each neuron real- 
izes some hyperplane crossing a unit N-dimensional hypercube (Nis the number of 
neuron inputs) and separating two vertex classes: (1) vertexes with the unit compo- 
nent number less than N/2 and (2) vertexes with the unit component number more 
than N/2 (the zero component number is less than N/2). Let us choose a neuron with 
unit weighting coefficients out of all of the neuron set realizing the majority decision 
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rule. Their corresponding hyperplanes are parallel and cross the coordinate axes un- 
der 45°. The majority element that determines the majority rule is usually chosen in 
this case, and it satisfies the following equation: 


N 
y=sign nt (15.1) 
i=1 


where x;€ {0,1} are the input values and a, = (N - 1)/2 is the threshold of the majority 
element 


0, x<0 


15.2 
1, x>0 ( ) 


sgn(s)=| 


This realization is the limiting case for the threshold decrease because (15.2) is not 
valid at 


where € is an indefinitely small value. The hyperplane corresponding to the majority 
element for the case 1 = 3 is shown in Fig. 15.2. 

Here and below, the criss-crosses indicate the input and intermediate values corre- 
sponding to the unit output, and the circles indicate the values corresponding to the 
zero output. A realization limiting the threshold increase of the considered family of 
neurons is a neuron with the threshold (N + 1)/2 described by the expressions (15.1) 
and (15.2). 

Figure 15.3 shows a hyperplane corresponding to such a realization for N = 3. 

Therefore, let us investigate neurons with unit weights and thresholds from the 
interval [(N-1)/2, (N + 1)/2]. Three neurons with the thresholds (N - 1)/2, (N + 1)/2, 
and N/2 were taken for the experimental study. 


Fig. 15.2. Hyperplane realized by the majority el- _ Fig. 15.3. Hyperplane realized by the neuron with 
ement in the space of inputs (aj = (N - 1)/2) unit weights and the threshold (N + 1)/2 in the 
space of inputs 
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Since the change of the weighting coefficient mathematical expectation shift -Aa 
indicates some hyperplane shift and rotation, then the investigation of the three afore- 
mentioned realizations for different Aa results in the determination of the optimal 
realization for the arbitrary Aa. The experiment was performed for three values of the 
mathematical expectation shifts: Aa = 0.15; 0.0; -0.15. The average experimental curves 
are represented in Fig. 15.4a-c. 


Fig. 15.4. 

Average curves for correct 
recognition probability de- 
pendence upon the variance 
of weighting coefficients in 
the case of three realizations 
of one neuron: a Aa = 0; 

b Aa = 0.15; ¢ Aa = -0.15 
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Fig. 15.5. 
Hyperplane realized by the 
optimal neuron 


x 


Variance D[a] is measured in the same units as for the weighting coefficients. It is 
seen that the correct recognition probability decreases in general with the variance 
increase. In Fig. 15.4a, at Aa = 0, the optimal realization is the realization of the neuron 
with the threshold a) = 1.5. In Fig. 15.4b, at Aa = 0.15, the optimal realization is the 
realization of the neuron with the threshold a) = 2. In Fig. 15.4c, at Aa = -0.15, the 
optimal realization is the realization of the neuron with the threshold a)= 1.0. The 
averaging of all corresponding correct recognition probability values across realiza- 
tions for three values of Aa shows that the neuron with the threshold a)= 1.5 repre- 
sents an optimal realization for arbitrary Aa. Thus, one can make a general conclusion 
for the case of the optimal neuron with the threshold a, = N/2, and, consequently, with 
the thresholds (N-1)/2 at Aa>0 and (N+ 1)/2 at Aa <0. 

The value Aa for the real neurons can be either positive or negative. The optimal neu- 
ron realization in this case corresponds to the hyperplane equidistant from the symmetri- 
cal points of both classes, i.e., the hyperplane drawn through the middles of the corre- 
sponding hypercube ribs. This hyperplane position for N = 3 is represented in Fig. 15.5. 

The following conclusion can be made on the basis of the obtained results. Each 
hyperplane must be drawn through the middles of the corresponding hypercube ribs 
at the multilayer neural network synthesis on the binary input signal set. Then the 
neurons and the neural network possess maximal parametrical reliability with respect 
to all other possible realizations. Only such optimal neural networks with optimal 
neurons are considered at all the following stages. 


Stage 2. Let us consider six different neural networks with two,three,...,seven neu- 
rons in the first layer. All the neurons realize the optimal hyperplanes according to the 
results obtained at stage 1. 

Figure 15.6 shows one of such realizations (two-layer neural network with H, = 2 
neurons in the first layer), where x/ is the output value of i-th neuron of j-th layer. The 
experimental curves for the correct recognition dependence on the variance at Aa = 0 
are represented in Fig. 15.7a,b (curves for H, =2,4,6 and H, =3,5,7 are displayed in 
different figures for clarity). 

The obtained results show that the parametrical reliability is constant when H, in- 
creases and variance is small: 0 < D[a] < D’[a] (D*‘[a] = 0.6). The probability increases 
with the increase of H, in the case D"[a] < D[a] <2. This probability increase is espe- 
cially stressed at H, = 6.7. Consequently, one can make a conclusion that two-layer neural 
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networks are characterized by the improvement of the correct recognition probability 
dependence upon the variance (the increase of the parametrical reliability) with the 
increase of H,. 
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Fig. 15.8. 
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Stage 3. Let us consider five different neural networks with three, four, ..., seven neu- 
rons in the first layer, two neurons in the second layer, and one neuron in the third 
layer. All the neurons realize optimal hyperplanes. The parametrical reliability curves 
for H, = 3,5,7 are represented in Fig. 15.8. 

On the basis of the obtained curve analysis, one can make a conclusion similar to 
that made at stage 2: three-layer neural networks are characterized by the increase of 
the parametrical reliability with the increase of H). 


Stage 4. This investigation stage deals with the problem concerning the change of 
parametrical reliability at the transfer from the two-layer neural network to the three- 
layer neural network that realizes the same logical function under the fixed value of H,. 
The previously obtained results at stages 2 and 3 can be used in this case. The corre- 
sponding comparative characteristics for two- and three- layer neural networks with 
H, = 3,5,7 are represented in Figs. 15.9-15.11. 

It is seen that the transfer from the two-layer neural network to the three-layer neural 
network results in the decrease of the parametrical reliability. For demonstrativeness, 
in addition to the aforementioned curves, the curve for one neuron realizing the same 
logical function is represented in Fig. 15.9. One can make a conclusion that the in- 
crease of the number of layers results in the decrease of parametrical reliability. 
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Fig. 15.10. Correct recognition probability 
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Stage 5. Let us consider all possible two- and three-layer neural networks from the assem- 
blage that was described above. The total number of neurons in these neural networks is 
assumed to be equal. There are only three such pairs: with six neurons (3+ 2+ 1 and 
5 + 1), seven neurons (4+ 2+ 1 and 6 + 1), and eight neurons (5+ 2+ 1 and 7+ 1). The 
corresponding curves are represented in Fig. 15.12a—c. One can conclude that the two- 
layer neural network possesses the highest parametrical reliability. 

The performed experiments allow one to make the following conclusions: 


1. The correct recognition probability decreases with the increase of variance of the 
weighting coefficients and threshold at the fixed mathematical expectation shift; 

2. The neuron possesses maximal parametrical reliability when the hyperplane real- 
ized by this neuron is drawn through the middles of the corresponding hypercube 
ribs; 

3. Neural network parametrical reliability increases with the increase of the number 
of neurons in the first layer in the case of both two- and three-layer neural net- 
works; 

4. The transfer from the two-layer neural network to the three-layer neural network 
realizing the same logical function with the same number of neurons in the first 
layer results in the decrease of the parametrical reliability; 
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Fig. 15.12. 
Average curves for correct 
recognition probability de- 
pendence upon the variance 
of weighting coefficients in 
the case of two- and three- 
layer neural networks with the 


same total number of neurons: 
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5. The comparison between two- and three-layer neural networks with the same total 
(across all the layers) number of neurons shows that the two-layer neural network 
possesses higher parametrical reliability. 


The following plan of experiment can be performed in addition to that described 
above in the case of a sufficiently large number of the required multilayer neural net- 


work realizations. 


15.5 - Investigation of the Multilayer Neural Network's Functional Reliability 
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To investigate the dependence of the neural network parametrical reliability upon 


1. The number of layers at the fixed values of H,, H,,...; 
2. Dimensionality of the input signal of the neural network H,, W; 
3. The number of neurons at the fixed number of the neural network inputs. 


15.5 
Investigation of the Multilayer Neural Network's Functional Reliability 
in the Case of Catastrophic Failures 


The experimental methods for investigation of the multilayer neural network func- 
tional reliability in the case of catastrophic failures consist in the successive modeling 
of the single-fold failures of the logical constant type at the neuron inputs and outputs 
and calculation of the correct recognition probability value for each failure. 


Table 15.1. Results of investigations of the neural network reliability 


Number of Failure type CRP Number of Failure type CRP 
layer neuron layer neuron 
1 1 Input 1 0 0.875 2 1 Input 1 0 0.875 
0.875 0.375 
2 0 0.875 2 0 1.000 
0.875 1 1.000 
3 0 0.875 3 0 0.875 
0.875 0.375 
Output 0 0.875 Output 0 0.750 
0.375 0.375 
2 Input 1 0 0.875 2 Input 1 0 1.000 
0.875 1.000 
2 0 0.875 2 0 0.875 
1 0.875 1 (0.3175 
3 0 0.875 3 0 1.000 
0.875 1.000 
Output 0 0.875 Output 0 0.875 
0.375 0.375 
3 Input 1 0 0.875 3 1 Input 1 0 0.750 
0.875 0.375 
2 0 0.875 2 0 0.875 
0.875 0.375 
3 0 0.875 Output 0 0.625 
0.875 0875) 
Output ‘ — CRP — Correct Recognition Probability. 
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Let us describe the process of the “potentially dangerous” failure search on the basis 
of the network catastrophic reliability investigation and methods of the network logi- 
cal redundancy determination. 

The results of the investigation are represented in Table 15.1. They indicate that on 
the failure class of the logical constant type at the neuron’s input-output, under the 
failure equal probability, the analyzed network possesses the logical redundancy coef- 
ficient of 6/46, i.e., at six failures out of 46 possible ones, the correct recognition prob- 
ability amounts to 1. One can also select “potentially dangerous” failures taking the 
minimum acceptable value of the correct recognition probability. Let it be equal to 
0.75. Then only eleven “potentially dangerous” failures having the correct recognition 
probability of 0.375 exist. For example, const. = 1 at the output 1 of the first layer neu- 
rons; const. = 1 at the input 1 of the second layer neurons, etc. 

Thus, the experimental study of the neural network catastrophic reliability allows 
one to take into account and to use the logical redundancy for eliminating the possi- 
bility of “potentially dangerous” failures. This can significantly increase the reliability 
of the designed logical devices. 
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Chapter 16 


Neural Network Diagnostics 


The introduction of dynamic redundancy into the structure of digital devices for re- 
liability improvement requires the development of technical diagnostics of failures that 
occur in the structures implementing these devices. 

Such diagnostic methods proposed in several works can be divided into two groups: 
methods of diagnostics and control of separate neurons at the level of separate functional 
components (multiplier, adder) and methods of diagnostics of neural networks at the 
level of separate neurons. The algorithms of the first group are described in detail in [15- 
15, 15-37, 16-1 to 16-4]. The authors made a complete classification of neuron failure types 
and developed synthesis algorithms for tests of neuron failure control and localization to 
an accuracy of the input-output. The algorithms of the neuron synthesis without logically 
indistinguishable failures to an accuracy of the input-output are developed in [15-15]. The 
proposed algorithms are sufficiently efficient for the separate neuron diagnostics, but they 
are practically inapplicable to the neural networks with a large number of neurons. 

The algorithms of the second group are described in [16-5, 16-6]. They include di- 
agnostic methods of neuron circuits of special types (2-neurons combination, cascade 
circuit, etc.). The disadvantage of the proposed diagnostic procedures is their low prac- 
tical applicability. 

The algorithms of the neural networks’ technical diagnostics that provide the con- 
trol over their performance and failure localization to an accuracy of a separate neu- 
ron are described below. The algorithm of neuron failure localization in the neural 
network is based on the investigation of the network state graph represented in Sect. 16.1. 
This technique provides the algorithm of minimum fault detection test for failures of 
the logical constant type. The method of adaptive diagnostics presented in Sect. 16.4 
is based on the synthesis of an adaptive diagnostic network in the form of a neural net- 
work. It localizes any failure of the logical constant type that occurs in the neural network 
at the neuron input-output during one cycle of the neural network performance. 

All the diagnostic algorithms considered in this chapter can be divided into two 
groups by methods of their realization: software or hardware. 

The following algorithms belong to the first group: the algorithm of failure localiza- 
tion in the neural network (Sect. 16.2) and the algorithm of the minimum fault detec- 
tion test design for the failures of the logical constant type. The former algorithm is 
based on the construction and investigation of the neural network state graph. The 
latter algorithm is based on the construction of the minimum neural network state 
graph corresponding to the minimum fault detection test for the logical constant fail- 
ure type at the neuron outputs. 
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The second group includes the method of adaptive diagnostics of failures of the 
logical constant type at the neuron inputs-outputs. This method is based on the mod- 
eling of all possible failures of some given type, learning sample creation, and synthe- 
sis of the adaptive diagnostics network. 


16.1 
Neural Network State Graph - The Main Notions and Definitions 


Different methods for the description of the neural network functioning exist: analyti- 
cal, structural, geometrical, etc. Though each of these methods provides a complete 
description of a given neural network, it mainly reflects one particular characteristic 
of the network functioning. We introduce below the notion for the neural network state 
that describes the logics of its functioning. 


Definition 1. Let us call the value a,; that represents the outputs of all the neurons of the 
i-th layer and satisfies the condition 


1 2 H 1 
Gij = {a}j,a,....a5 | > ai; = {0,1} 


as a j-th node of the i-th layer of the state graph. Here H; is the number of neurons in 
the i-th layer; ai is the output value of the j-th neuron in the i-th layer. 


Definition 2. The state graph branch is a directed segment linking two state graph nodes 
and designated as 

Gij Da 5 l=i+1 
Definition 3. The state graph nodes of the zero level representing the input variable 


values are called the tops of the state graph. 


Definition 4. The state graph nodes of the W-th level (W is the number of neural net- 
work layers) are called the roots of the state graph. 


Definition 5. A path in the state graph is an arbitrary chain consisting of nodes linked by 
branches according to the functioning of the neural network possessing a top and a root. 


Definition 6. The state graph represents a tree-like directed disconnected graph com- 
posed of paths and nodes located at corresponding levels. 


Statement 1. Let us show that the state graph completely describes the neural network 
functioning for all values of the input variable. Consider all the nodes of the state graph 
related to the arbitrary path: 


90 jg 7 Aj 7 PW 


Since the node ay. represents the value of the input variable, and the nodes ai; 
(i= 1,2,...,W) represent the outputs of all the neurons and all the layers in the order of 
the number increase, then the network functioning, ie., its total response to the given 
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input action App is completely determined. Since the state graph is the aggregate of all the 
possible paths, then it determines the neural network response to all the input actions. 


Definition 7. Neuron failure is considered as critical if the fault that results in the 
emergence of the error at this neuron output (when one or several input values occur 
at the input of this neuron) finally results in the error at the output of the whole neural 
network. Neuron failure is considered as uncritical if any error at its output does not 
result in the error at the output of the whole neural network. 


Definition 8. A path in the state graph is considered as a faulty one if it corresponds to 
the neural network with a critical failure. Le., the fault path possesses the root corre- 
sponding to the wrong value of the logical function realized by the neural network. 


Definition 9. A path possessing a required root, i.e., a path corresponding to the neural 
network without failures, or with uncritical failures, or with critical failures that do not 
influence upon the given neural network input value, is considered as a correct one. 


Definition 10. A complete state graph is a graph with 2” tops, where n is the dimensio- 
nality of the neural network input. 


Definition 11. A closed region formed by hyperplanes (realized by neurons) and 
hypercube faces is called a hypercube compartition. Each compartition has its number 
determined by the number of the neuron outputs, i.e., the number of compartitions is 
the state graph node. 


16.2 
Algorithm of Failure Localization in the Neural Networks 


The essence of the proposed algorithm will be explained in some particular examples. 
Let us first consider the case of the single-fold failures and then generalize the ob- 
tained results for the case of multiple failures. 

Let us take the three-layer neural network with three neurons in the first layer, two 
neurons in the second layer, and one neuron in the third layer. The location of hyper- 
planes realized by the first-, second-, and third-layer neurons is shown in Fig. 16.la—-c 
respectively, where x/ is the output value of the i-th neuron in the j-th layer (at j = 0, it is 
an input variable value). The criss-crosses indicate the values that provide 1 at the neural 
network output, and circles indicate the value providing 0 at the neural network output. 
The numeric characters at each hyperplane indicate the number of neurons in the layer. 

The full state graph corresponding to the considered neural network is represented in 
Fig. 16.2. Let us consider, for example, a parametrical critical failure of the first neuron in 
the first layer. The hyperplane positions in this case are represented in Fig. 16.3a-c. As a 
result of the failure neuron weighting coefficient changes, the hypercube top 110 appeared 
in another compartition with the number 100 (it was initially in compartition 000). This 
results in the neural network output error: not all the criss-crosses and circles are in the 
different compartitions. The full state graph corresponding to this failure is shown in 
Fig. 16.4. The dashed line indicates the one-valued branches of the fault paths. 
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Fig. 16.1. Hyperplanes realized by the three-layer network without failures 


Fig. 16.2. 000 001 010 011 100 101 111 111 


The state graph of the free- 
failure three-layer neural net- ae eee 
work (Fig. 16.1) 
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Fig. 16.3. Hyperplanes realized by the three-layer neural network with failure 


Fig. 16.4. 000 001 010 011 100 101 110 111 
The state graph of the three- ee 

layer neural network with 2 

failure (Fig. 16.3) an 


000 100 010 001 


A single path in the state graph corresponds to each neural network input value. 
Consequently, the transfer of the hypercube top into the given compartition corre- 
sponds to the path transformation, related to this given top, into some fault path. This 
new path has the same top but some of its other tops are changed. 
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x5 c 
xi 
x 
Fig. 16.5. Hyperplanes realized by the three-layer neural network with failure 
Fig. 16.6. 000 001 010 011 100 101 111 111 
The state graph of the three- Ps 
layer neural network with ZS 
failure (Fig. 16.5) La 
000 100 010 001 
00 10 11 
0 1 


Figure 16.5a—c represents another example of the parametrical failure of the con- 
sidered neuron. As a result of the fault, the tops 100 and 110 of the unit hypercube 
appeared in the new compartitions 000 and 100, respectively. It corresponds to the 
emergence of two fault paths in the state graph in Fig. 16.6. 

The problem of the failure neuron search using the state graph consists in the search 
of such a transformation of one or several fault paths into correct paths that does not 
result in the emergence of additional fault paths. The form of this transformation (num- 
bers of node positions in the state graph that changed their values) must indicate the 
number of the failure neuron. In the example represented in Fig. 16.4, the only fault path 


110 > 100 > 001 (16.1) 
must be transformed into another path 
110 > 000 > 100 (16.2) 


It is seen from (16.1) and (16.2) that some node positions changed their values. The 
first distinct position indicates the possible failure of the first neuron in the first layer. 
The validity of such a proposition can be shown if proving the following statement. 


Statement 2. Only one-valued branches of the fault paths can be transformed with the 
help of the state graph at the search of the neuron failure in the neural network. Let us 
prove this. The values of the logical function (the roots of the state graph) are correctly 
and unambiguously put into correspondence with all the tops of the state graph cor- 
responding to the failure-free neural network. Some of the branches change, and the 
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fault branch emerges at the failure occurrence. Each of the fault paths corresponds to 
a single error at the neural network output. The required transformation is the inverse 
to the failure emergence transformation, i.e., the transformation that eliminates the 
failure and does not add any new failure. Let us assume that we transform the many- 
valued branch of the fault path. This means that all the other paths possessing this 
node will obtain another root. And since these paths were correct according to the 
initial assumption, then the transformation converts them into the fault ones. But this 
contradicts the transformation feasibility, and therefore the statement is proved. 

In the case of another failure in the state graph shown in Fig. 16.6, one observes two 
fault paths with the tops 100 and 110. The first path has the single one-value branch at 
the zero level. Let us consider all the possible transformations of this branch into the 
branches of the corresponding parts of correct paths: (> 100 > 00> 1) and 
(> 010 11— 1). As a result, the transformations have the following form: 


100 > (0 1)00>00>1 (16.3) 


100 (6-3 10-3 11-4 (16.3) 


where two values in brackets indicate the value change at this position during the trans- 
formation performance. One can expect from (16.3) that the failure neuron is the first 
neuron in the first layer. Similarly, one can expect from (16.4) that the failure neuron 
is the second neuron in the first layer. It can be written for the branch of the zero level 
in the second fault path: 


110 > (1 0)00 > 1030 (16.5) 


i.e., the first neuron of the first layer is under a cloud. The second path is also one- 
valued at the first layer. Consequently, the following transformation is possible: 


110 100>(0>1)050 (16.6) 


This suggests the possible failure of the first neuron in the second layer. Let us prove 
the following statement in order to reveal the required neuron out of the whole assem- 
blage of the suspected neurons. 


Statement 3. Let us assume that one critical neuron failure exists in the neural network, 
and more than one fault path exists in the state graph. If the set of neuron numbers 
suspected to be the numbers of the failure neurons is obtained, then there exists a 
neuron number that is found in this set a maximum number of times, and it represents 
the number of the failure neuron. Let us assume that there are N fault paths and one 
critical neuron failure in the state graph. It follows from the nature of any failure that 
there always exists a transformation inverse to the failure. Let us assume that this trans- 
formation corresponds to the neuron number that is not maximal out of all suspected 
neuron numbers. It means that not all the fault paths are transformed (as it follows 
from the search procedure for the suspected neuron). And the latter conclusion contra- 
dicts the above assumptions. 
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According to the above statements, the first neuron in the first layer in the afore- 
mentioned example is the failure neuron because its number is found twice among the 
suspected neurons. 

The search process of the failure neuron number can be simplified on the basis of 
the following statement. 


Statement 4. Let us perform a sequential (from the top to the root) comparison be- 
tween the fault path in the state graph and the corresponding correct path. The first 
distinct position indicates the number of the failure neuron. Let us prove this state- 
ment. According to the above statement 2, in the case of several one-valued branches 
in the considered fault path, it is necessary to search the individual transformation for 
each branch. However, we want to prove that it is sufficient to find the only one trans- 
formation for one correct branch belonging to the uppermost level. 


Let us consider one failure in the neural network. Compare sequentially the nodes 
of the fault path in the state graph. Let the node aj be distinct, and the branches of the 
j-th and (j + 1)-th level be one-valued. The noncoincidence of the node a; indicates the 
error at the output of the i-th neuron in the j-th layer. Since the nodes of the previous 
layers coincided, then namely the i-th neuron in the j-th layer is a failure neuron. 

Only a single failure occurred in the neural network according to the assumption. 
Then all other noncoincidences of the nodes are caused by the failure of namely the 
i-th neuron in the j-th layer. 

Let us list the sequence of the main stages of the algorithm for the failure neuron 
localization in the case of the single-fold failure. The state graph of the correct func- 
tioning neural network is considered to be given. 


1. The values of the input variable are sequentially applied to the neural network input; 

2. The neuron outputs are stored for each input value (a path in the state graph is 
created); 

3. The obtained output value is compared with the root of the corresponding path in 
the given state graph (a path with the same top). If the roots coincide then go to p. 1, 
otherwise go to p. 4; 

4. The positions of both paths in the state graph are compared from the top to the 
root; 

5. The first distinct position indicates the number of the failure neuron, and the pro- 
cess terminates because the failures are single-fold. 


The proposed algorithm can be easily generalized for the case of multi-fold failures. 
The example represented in Figs. 16.5 and 16.6 shows that the failure can influence the 
appearance of errors in the following layers, and the comparison between nodes can 
result in the wrong consideration of the failure after-effect as the failure itself. The 
failure search algorithm in the case of many-fold failures is 


1. The values of the input variable are sequentially applied to the neural network input; 
2. The neuron outputs are stored for each input value (a path in the state graph is 
created); 
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3. The obtained output value is compared with the root of the corresponding path in 
the given state graph (a path with the same top). If the roots coincide and the ap- 
plied input value is not the last one then go to p. 1, otherwise go to p. 4; 

4. Positions of both paths in the state graph are compared from the top to the root; 

5. The first distinct position indicates the number of the failure neuron. If the applied 
input value is not the last one then go to p. 1, otherwise, go to p. 6; 

6. Perform the failure correction and go to p. 6. 


If the failure neurons are located in W layers, then the process is repeated W times. 

In order to estimate the operation speed of the proposed method, let us compare it 
with the enumerative technique consisting in the test of each neuron in the neural 
network with single-fold failures. The state graph of the correct functioning neural 
network is considered to be given. Let us create the state graph corresponding to the 
neural network with failures. 


A. Apply sequentially all the input variable values to the neural network input and obtain 
a state graph path and an output value for each input value. Compare the result with 
a correct neural network (without failures). If both outputs coincide, then apply the 
next input value, otherwise go to p. B. Thus, 2” elementary operations of bit-by-bit 
comparison are performed after the application of the full set of values to the neural 
network input. 


B. A fault path is detected in the state graph. Compare it with the corresponding path 
of the correct neural network. The first distinct position indicates the number of the 
failure neuron. If only one neuron failure exists, then 


WwW 
pare 
i=l 


of elementary comparisons is performed after the application of the full set of the 
input values. Then the maximum comparison operation number in the case of the failure 
localization algorithm can be expressed in the following form: 


Ww 
Nige=o ce! (16.7) 
i=l 


The value N, reaches its maximum (16.7) because the suspected failure neuron 
possesses the last number in the last layer and the last fault path. 

Let us consider now the neural network test using the enumerative technique. If the 
complete test for one neuron has a length of 2%, and all the neurons must be tested 
(N; is the number of neurons inputs in the i-th layer; N;= H,_,), then the number of 
elementary comparisons in the case of enumeration is 


Ww 
Vee 2 (16.8) 


i=1 
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The value N, reaches its maximum in the sense that the test length for one neuron 
is estimated as 27, 
Let us show the validity if the inequality 


Nimax < Nomax (16.9) 


Taking into account (16.7) and (16.8), one obtains for the inequality (16.9) 
W W 
2¥0 45H, <> 24-14; 
i=l i=l 
Removing the summation symbols in the latter expression, one gets 
210 4H, +...+Hy <H,2"0 +H,2"1 +...4Hy22wa (16.10) 


Taking into account the evident inequalities, 


Hy <H,2"1 
if H;>1, i=2,...W 


Hy <Hy2!w 


the inequality (16.10) takes the following form: 
210 4H, <H,2"o (16.11) 


The cases of practical interest are H)>3 and Hy jax = 2". 
The expression (16.11) in this case takes the following form: 
HoH — 72Ho (16.12) 

Taking the logarithm of (16.12), one gets Hy > 1, which is always valid. The inequal- 
ity (16.9) is therefore proved. 

Thus, the proposed failure localization algorithm is always faster than the algorithms 
based on the enumerative technique. 

Let us perform the similar operation speed estimations in the case of multi-fold 
failures. Consider the case of m failures in k layers. Additionally, let the neuron failure 


localization be the worst one: k = k,,,, = W. Then the inequality (16.9) has the follow- 
ing form: 


Ww 


Ww 
20 +5" H; 
i=l 


Ww 
a> 27g, (16.13) 
i=] 
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Since WS H,, then the inequality (16.13) takes the form 
Ww wo 

W)1Hj; <> 2418; (16.14) 
i=l i=2 


In the case H,2>H,>1, WS H,, one obtains 
W(H,+H,)<2"1H, 


and the inequality (16.14) takes the following form: 


Ww Ww 

W)-H; <> \H,2"'1 (16.15) 
i=3 i=3 

The inequality (16.15) is valid if H, > H,>... > Hy and H,=... = Hy. Then the in- 


equality (16.13) is also valid. 

Let us estimate now (by the lower-bound estimation) the relative speed gain at the 
transfer from the enumerative technique to the failure localization algorithm in the 
case of multi-fold failures: 


W 


* 
lmax __ 


N, Lae 
max 2 i-1H, 
i=l 


W 
240 45" H; 
i=1 


(16.16) 


where N}jnax is the lower-bound estimation for the failure localization algorithm op- 
eration speed in the case of multi-fold failures. If H, = H, = ... = Hy, then the inequal- 
ity (16.16) takes the following form: 


Nimax =14% (16.17) 
Nomax H 2 
If H, > H, >... > Hy, then one can use some average value H instead of H in (16.17). 
For example, one can take 
H= ie a 


q—_titHet+..+Hw 
Ww 
It is evident that one can ignore the second summand in the sum (16.17) at suffi- 
ciently high values of H. Hence, according to the lower-bound estimation, the relative 
speed gain at the transfer from the sequential enumerative testing to the proposed 
failure localization algorithm increases linearly with the increase of the number of 
neurons in the layers. 
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16.3 
Algorithm of the Minimum Test Design for the Failures 
of the Logical Constant Type at the Neuron Outputs 


The proposed algorithm for the minimum fault detection test design has a restricted 
application field because this test checks not all the faults of the constant type, but only 
the faults of the logical constant types at the neuron outputs. It can be used only in the 
cases of failure-free neuron inputs. 

Let us consider the neural network input space representing a unit hypercube divided 
by hyperplanes into compartitions. The considered hyperplanes are realized by the first- 
layer neurons. Let all the compartitions except one include one hypercube corner, and the 
selected compartition includes n corners. Let the number of compartition be ajj;_, 


12 Hy l 
ai = {ahaa} > aj = {0,1} 


Let us assume that the failure of some neuron results in the change of some value ai. 
Then the compartition number for all m corners lying inside it will change because 
only the failures of the logical constant types are considered. Consequently, this failure 
appeared to be displayed at all m input values. However, it is necessary that one failure 
must be displayed at not more than one input value. Then the procedure for the full test 
minimization (the test with 2” input values) consists in the search of compartitions 
including more than one corner and eliminating any excessive corners in this 
compartition in order to provide only one corner inside it. 

Since the hypercube corners are simultaneously the tops of the state graph, then the 
process of the full test minimization described above is similar to the full state graph 
minimization. Let us consider the case represented in Figs. 16.1 (neural network) and 
16.2 (state graph). The belonging of several hypercube corners to one compartition is 
reflected in the state graph by the fact that all the paths with these corners have com- 
mon branches beginning from the first level. Let us eliminate such corners. There are 
five of them in the considered example: 


(000, 011, 101, 110, 111) (16.18) 


Let us compose one top, for example 000, from the assemblage (16.18). Then the 
state graph looks like that represented in Fig. 16.7. 


Fig. 16.7. 000 001 010 100 
Minimized state graph | | | | 


000 001 010 100 
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The tops of this minimized state graph represent the input values for the minimized test 
(000, 001, 010, 100) (16.19) 
Let us prove that this test is minimal across the given above set of failures. 


Statement 5. The number of tops of the neural network minimized state graph in the 
case of logical constant type failures is equal to the length of the minimum test disclos- 
ing all the failures of the given class. 


The number of tops determines the length of the corresponding test according to 
the definition of the state graph. Let us prove now that the corresponding test is mini- 
mal for the given number of failures. 

The state graph minimization consists in the sequential enumeration of the nonempty 
compartitions formed by the first layer neurons and in the elimination of the second, 
third, etc., input space hypercube corners belonging to one compartition. This proce- 
dure results in the state graph minimization corresponding to the state with only one 
argument value in each nonempty compartition. 

Let us assume that the obtained test is not minimal. Then the elimination of any top 
from the obtained assemblage results in the emergence of a new empty compartition, 
and the logical function value in this compartition is undetermined. Therefore, the 
neural network minimized test cannot result in the emergence of the fault path corre- 
sponding to the eliminated top. Taking into account a single-valued correspondence 
between any path in the minimized state graph and some failure group, it appears that 
this failure group becomes undetectable at this test. This contradicts the test definition, 
ice., the detectability of all the failures of a given class. Since the eliminated top was an 
arbitrary one, then the test corresponding to the minimized state graph is minimal, 
and the statement is therefore proved. 

Taking into account all the aforesaid, the procedure for the minimal test design in 
the case of the logical constant type failures at the neuron output can be performed 
according to the following scheme: 


1. Two levels of the full state graph (zero-order and first-order ones) are constructed 
for the correct functioning neural network. In this case, each input value corresponds 
to the first-level state graph node; 

2. The number of non-recurrent graph nodes gives the length of the minimal test, and 
their corresponding tops provide the minimal test input values. 


16.4 
Method of the Neural Network Adaptive Failure Diagnostics 


If the neural network must function in the continuous mode with high reliability, then 
it is impossible to interrupt its functioning for the diagnostic performance. The failure 
neuron therefore must be localized at the first application of the input signal that cor- 
responds to the neuron failure. The method of the neural network adaptive failure 
diagnostics is used in this case. The adaptive diagnostic network is synthesized in the 
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Fig. 16.8. Hyperplanes realized by the two-layer neural network 


Fig. 16.9. oo 8001 10 11 
State graph of the two-layer 
neural network (Fig. 16.8) 
01 11 10 
1 0) 


form of the neural network with full sequential connections. It must be able to localize 
any failure of the logical constant type at the neuron input-output at the first applica- 
tion of the input value that corresponds to the neuron failure, i.e., to perform diagnos- 
tics parallel to the neural network functioning. This method can be therefore called as 
“a method of parallel diagnostics”. 

Let us consider the case of the learning sample generation for the adaptive diagnos- 
tic network synthesis in the example of the two-layer neural network with two neurons 
in the first layer and one neuron in the second layer (Fig. 16.8). 

Let us assume here and below the existence of only one failure in the neural net- 
work. The test is carried out during one cycle of the neural network functioning. Fig- 
ure 16.9 represents the full state graph of the considered neural network without fail- 
ures. Figures 16.10-16.12 represent state graphs for all the possible failures of the given 
class, where x} is the value of the j-th input of the i-th neuron in the j-th layer, and x4; 
is the value of the i-th output of the /-th layer. 

Let us divide all the failures into classes corresponding to their neurons. The num- 
ber of classes is equal to the number of neurons plus 1 (the last class is the class of the 
neural network without failures). Figures 16.10-16.12 represent state graphs of the two- 
layer neural network for all failures of constant types respectively of the first neuron 
in the first layer, of the second neuron in the first layer, and of the second-layer neuron 
output. Let us compose aggregates of the state graph fault paths (in the case of failure 
classes) or failure-free paths (in the case of failure-free classes). All the repeated paths 
in all the classes are excluded. Each fault path corresponds to one failure. All the fault 
paths are considered as a part of the learning sample that represents the failures classes. 
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Fig. 16.10. State graphs of the two-layer neural network (Fig. 16.8) for all failures of constant types of 


the first neuron in the first layer 
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Fig. 16.11. State graphs of the two-layer neural network (Fig. 16.8) for all failures of constant types of 


the second neuron in the first layer 


16.4 - Method of the Neural Network Adaptive Failure Diagnostics 


335 


Fig. 16.12. 00 01 10 11 
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It has the following form in the considered example: 


00 110 
00 010) 00 100 
11 110 


Ol 1117-1 10 111/2 , 3 (16.20) 


1 O11 
11 010; 11 100 
10 101 


Let us take all the failure-free paths from the state graph represented in Fig. 16.9. 
They are considered as the second part of the learning sample that represents the fail- 
ure-free neural networks class: 

00 111 

01 010 

11 111 

10 100 


(16.21) 


The numeric characters near the curly braces indicate the class numbers. 
It is evident that all the paths in the state graph from the obtained aggregate (16.20), 
(16.21) represent the tops of the unit hypercube with dimensionality 


W 
N=)04; 
i=0 


N=5 in the considered example. The problem of the adaptive diagnostic network 
synthesis is solved as a usual pattern recognition problem. The neural network is syn- 
thesized using the learning samples (16.20), (16.21) by some adaptive algorithm. This 
neural network divides the unit N-dimensional hypercube into several compartitions 
consisting of the elements of one class. Such a partition is possible with the unit prob- 
ability if there are no equal elements in different classes. Let us prove the following 
statement. 


Statement 6. Any two paths in the neural network state graph are different if they cor- 


respond to the failures of two different neurons. 
Let us consider two fault paths in the state graph: 


{46 jg 2A jp FWiy } {0 jg by jy reer Dwi } (16.22) 
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They are equal under the following requirements: 


ay, =b, i=1,2,...,W, RH... 5 A; 


Hi? 
Consequently, the equal paths in the state graph have equal tops: 
Ag jg = Y0 jo > jo =1,...,Hg 


Thus, in order to prove the statement, one must show that the corresponding fail- 
ure-free path at the different failure neurons transforms into the different fault paths. 
Let this failure-free path be 


{cog >¢rj>--->wj } (16.23) 


There are two different failure types: (1) the failures corresponding to the neurons 
of different layers and (2) the failures corresponding to the neurons of the same layer. 

Let us consider the first case, when the failure neurons are located in the J-th and 
k-th layers. Then it is evident that cj, ayjp Cj, > Ojj2 where the arrows indicate to the 
transformation of one node to another one in the case of neuron failures. Thus, when 
the neuron failure in the failure-free path (16.23) occurs in the /-th layer, then the nodes 
C iii with the numbers /,/+1,...,W transform into the nodes bis, with the numbers J, 
1+1,..., W respectively. When the neuron failure in the failure-free path (16.23) occurs 
in the k-th layer, then the nodes Ciii transform into the nodes bi, with the numbers k, 
k +1,...,W respectively. Consequently, one can write 


i=1,...,[-1 
i=1,...,k—1 


Cig = Fiji > 
= (16.24) 
ij, = Oi; > 


Let us assume that /<k, then according to (16.24), ai. # b 
the paths a and b are different. 

Let us consider the second case, when the failure neurons are located in the same 
layer. Let the /-th and s-th neurons in the i-th layer be the failure neurons. Then 


i=L1+1,....k-1,k ie, 


ii? 


ck. =ak. , k=1,...,Hj, kel 


Yi Yi 
k k 
ci. = Di. > k=1,...,Hj, k#s 


and consequently, ak.# b, k=l, s,ie., the paths a and b are different. 

Since there are no other neural network failure types beside the aforementioned 
ones, then the statement is proved. 

As an example, the adaptive diagnostic neural network was synthesized using the 
learning samples (16.20), (16.21) according to an adaptive algorithm. Its block diagram 
is represented in Fig. 16.13. 

The block diagram is reduced to the form of the neural network with full sequential 


connections. The connections with zero weights are not shown. The weights between 
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the neurons of the first and second layers are + 1. The weighting coefficients of the 
first-layer neurons are indicated near the corresponding inputs by the numerical char- 
acters. The threshold values are indicated inside the rectangles corresponding to the 
respective neurons. The neural network has four outputs, and it is synthesized in such 
a way that the emergence of 1 at one of the outputs (with 0 at all other outputs) means 
that the input value is considered to belong to the corresponding class. 

The high structure redundancy observed at the adaptive diagnostic network syn- 
thesis is related to the disadvantages of this method. Such a redundancy is the “pay- 
ment” for the high operation speed provided by such an approach. Another disadvan- 
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tage of this method is the existence of only one neuron at the neural network output. 
In this case, the failures at the outputs of neurons in the penultimate layer and the 
corresponding inputs of the last layer are logically undistinguishable. But the advan- 
tage of high operation speed makes this disadvantage insignificant for the neural net- 
works with a large number of neurons. 

Let us consider the peculiarities of the adaptive diagnostics method. The learning 
sample generation requires the modeling of all the failures of the given class. This is 
the most time-consuming part of the adaptive diagnostic neural network synthesis. 
The automation of this process allows one to simplify significantly the adaptive diag- 
nostic network synthesis, and the use of the advanced algorithms allows one to obtain 
the optimal implementation of this network. 

The realization of the neural network itself, as well as of the adaptive diagnostic net- 
work using mono-functional element with single-type connections, provides an advan- 
tage at the implementation of the whole device in the form of VLSI circuit. It also allows 
one to use one and the same synthesis technique for both networks. Since the neuron 
redundancy exists in the process of the adaptive diagnostic network synthesis, then it is 
advisable to use this method for production of devices with high reliability requirements. 

The enhancement of the described approach application can be performed at the 
expense of the parametrical failure class. In order to prove this statement, one must 
show that all possible fault paths corresponding to the parametrical failures belong to 
the set of fault paths for the failures of the logical constant type at the neuron inputs- 
outputs. The check of this condition in the case of the simplest neural networks showed 
its validity. However, its proof in the general case seems to be rather complicated. 
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Methods of Problem Solving in the Neural Network 
Logical Basis 


17.1 
Neuromathematics - a New Perspective Part 
of Computational Mathematics 


Neuromathematics is a branch of computational mathematics dealing with the devel- 
opment of methods and algorithms for the solution in the neural network basis. The 
objective reason for providing the development of this new part of computational 
mathematics is a 30-year stock in the field of neural network theory that allows for the 
development of the universal approach of the neural network algorithms aimed at the 
solution in the domain of general and applied mathematics. 

We shall call the computational procedure that can be realized mainly by the neural 
network of a different structure the neuron algorithm (or neural network algorithm). 
The main task that solves by the neurocomputer is the fast problem solving. 

The first attempts to solve the computational problems with the help of 
neurocomputers relate to the 1960s and 1970s when the pattern recognition task was 
actual and included the problem of function approximation (K-classes of patterns in 
the multi-dimensional feature space). Thereafter, some other attempts to solve the clas- 
sical computational problems with the help of neural networks were taken. One of the 
examples of such tasks is the matrix inversion problem. The number of problems solved 
with the help of neurocomputers was significantly enlarged at the end of the 1980s. 
One can tell now about the potential universalism of neurocomputers. It is clear that 
any mathematical problem can be solved on the neural network logical basis. 

Even such problems that seem to be trivial (addition, multiplication, division, ex- 
tracting a root, numerical inversion, etc.) can be solved with the help of neurocomputers 
much more effectively than with the help of the usual Boolean elements. 

The field of application of the tasks that can be efficiently solved by neurocomputers 
is permanently and rapidly widening. The class of general mathematic tasks that can 
be efficiently solved by neurocomputers is rather wide. It includes, for example, the 
following kinds of tasks: 


= Systems of linear and nonlinear algebraic equations and inequalities; 

" The tasks of function approximation and extrapolation; 

" The optimization tasks (linear, nonlinear, and dynamic programming; combinato- 
rial tasks; the commercial traveler task; the task of the timetable arrangement; dif- 
ferent tasks with graphs); 

" The solution of ordinary nonlinear differential equations; 

" The solution of differential equations in partial derivatives. 


342 


Chapter 17 - Methods of Problem Solving in the Neural Network Logical Basis 


Various transformations can be realized in the neural network logical basis whereas 
their implementation using classical computers requires the development of special 
algorithms (algorithms of direct and inverse trigonometric and exponential functions 
that are the activation functions in the neural network approach). 

An especially important part of general neuromathematics is a complex of prob- 
lems related to graphs. In particular, these are the problems of the search formalization 
and calculation of the routes, cycles and cutsets in graphs, and the problems of parti- 
tion of graph, its drawing and arrangement. 

It is evident that the class of problems of general neuromathematics will increase in 
the nearest future. In the world, the number of scientific studies related to the neural 
network algorithms is rather large. The peculiarity of the Russian school of neuro- 
mathematics is the use of effective scientific results in the field of neural networks and 
corresponding effective methods for the neural network algorithm development ad- 
equate to the specific solution. 

The development of neuromathematics was initiated not by mathematicians 
but by the specialists in the theory of control and neurocomputers under the “nonver- 
bal behavior” of the single-functioned workers in the field of computational math- 
ematics. The methods of control theory, analytical self-adaptive systems and adaptive 
filtering formed the basis of the development of the neural network algorithm 
methods. The neural network algorithms are rather “similar” for different mathematical 
tasks because a large number of common problems are present in the methods of 
neural network algorithm development in the wide variety of solutions. The majority 
of these common problems are either not taken into consideration or passed over 
in silence. 

The objective reasons for the transition to the neural network algorithms for the 
solution of different tasks are the following: 


" The inability to solve complex tasks of general and applied mathematics in a given 
time by the use of computational systems of other architectures (at the equal cost 
of neurocomputers and these computational systems); 

" The objective necessity for the use of the neural network algorithm and its adequacy 
to the task under consideration. 


This results in the following main distinctions between the neural network algo- 
rithm and any other one: 


" The super-high parallelism (the parallelism of the neural network algorithms is 
always higher than that of technical facilities for their implementation); 

" The high capacity-to-cost (or capacity-to-size) ratio of technical facilities for the 
neural network algorithms implementation. 


As the development of neuromathematics (a part of computational mathematics 
realizing the tasks of general and applied mathematics in the neural network logical 
basis) is in progress, neurocomputers will pretend to the role of universal computa- 
tional systems. 

The main reasons to write the present article are the following: 
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" The primitiveness of the neural network algorithm that is used in the initial stages 
of the tasks’ solution (as a rule, after some solution quality is achieved, there are no 
methods for its further improvement); 

" The necessity for the elaboration of the neural network algorithms adequate to the 
task under consideration in the framework of some unified tasks solution technique 
in the neural network logical basis. 


A large number of the known works in the domain of the neural network theory, 
neuromathematics, neural control and neurocomputers can be conditionally divided 
into two parts. The first one resulted from the general reflections of different authors 
who are interested in these problems and who want to improve the solutions which 
they have found in the scientific publications or which they invented on the basis of 
some general ideas. The second one deals with the development of ideas that are born 
in the process of specific problem solving. The long-term practice shows that the real 
and serious theoretical statements and studies in this field of activity are developed 
namely in the second part of these works. This fact is not a simple appeal for the more 
active solution of practical problems but is a result of the long-term analysis of the 
large number of theoretical studies in this field of knowledge. 

The present study defines in some sense the logical pathway for the development of 
the neural network algorithms for the problem solutions and can serve as a basis for 
the creation of the intellectual program package implementing the neural network 
program solution algorithms. 


17.2 
Neural Network Theory - A Logical Basis for the Development 
of the Neural Network Problem Solution Algorithms 


The neural network theory presents the logical basis for the solution of the tasks of 
general and applied mathematics in the same way as earlier Boolean logic was the 
basis for the solution of the tasks by the computers with Neumann architecture. 

The neural network is a network with a finite number of layers consisting of single- 
type elements. Each element is similar to the neuron with different types of connec- 
tions between layers. The number of neurons in the layers must provide the given quality 
of the solution, and the number of layers must be as small as possible in order to 
minimize the time for this solution. 

The main properties of the neural networks are given below: 


" The homogeneous neural networks are characterized by the gradual degradation 
due to the breakdown of separate elements. This fact was shown by Rosenblatt who 
constructed the three-layer perceptron with random connections in the first layer 
with a redundant number of elements in this layer. Here, the function realized by 
the neural network is distributed across the structure; 

" The structure of the homogeneous neural network provides the possibility of large- 
scale parallelism during the performance of the large number of synchronous op- 
erations (addition, multiplication, and nonlinear fast-response transformation. The 
neural network structure does not contain complicated and “long” irrational opera- 
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tions over the operands (division, extraction of root, etc.) that the algorithms used 
in the monoprocessor computers; 

" The neural networks implement a rather flexible and complex functional transforma- 
tion of the input state space into the output one. Hence, the flexibility of this transfor- 
mation can be controlled by the number of layers and type of the used connection. 

" The neural network structure allows for the analytical description of the input space 
transformation into the output one; 

" The previous property of the neural network structure allows for the analytical 
adjustment of the neural network and for the control of the algorithm functioning 
in the process of solution; 

" The complexity of the neural network used for the solution of the particular prob- 
lem reflects the complexity of the problem itself; 

= In the future, with the use of the linear serial Gill machines, the neural network 
structure will allow for the solution of the problem of analytical description and the 
design of the adaptation algorithm synthesis in multilayer neural networks. 


The main advantages of the neural networks being the logical basis of the complex 
problem solution algorithms are the following: 


" The invariance of the neural network synthetic procedure, respectively the feature 
space dimensionality and size; 

= Correspondence to the modern and cutting-edge technology in microelectronics; 

* Fault-tolerance in the sense of monotonous, rather than catastrophic quality changes 
sums depending on the number of hors de combat elements in the sense of monoto- 
nous, rather than catastrophic problem-solution quality changes depending on the 
number of the breakdown elements. 


The postulatory base of the neural network theory is the stochastic Bayesian model 
of the outward things. In this connection, the formation input signal is carried out in 
terms of the pattern channel and the channel of supervisor instructions. Additionally, 
the input signal represents in general the nonstationary random signal with a complex, 
unknown, multimodal density of probability distribution. 


17.3 
Selection of the Problems Adequate to the Neural Network Logical Basis 


The bottom-line goal of the present study is the design of the program package in the 
neural network logical basis. The use of such a program package is necessary when the 
development engineer working over the solution algorithm has already finished the 
stage of decision concerning the necessity of the neural network approach and has 
assured himself that such an approach is necessary for him. 

The investigator begins the development of the neural network algorithm from the 
physical problem statement. The description of the physical problem statement must 
be performed not verbally, but in the form of the specific document, describing the 
initial data and the essence of the physical result that must be obtained as the result of 
calculations. If the development engineer of the neural network algorithm gets the 
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physical problem statement from some other person, then he must update the problem 
statement together with the problem originator. At any rate, the problem originator 
must not necessarily be a specialist in neural network algorithms. 

All the problems in the physical (not mathematical) problem statement are divided 
into two parts: unformalized and the formalized problems. Unformalized problems 
are the problems that cannot be formalized in the form of some mathematical terms, 
formulas, structures, graphs, etc. As it was mentioned above, the number of such prob- 
lems permanently grows. These problems are usually complex and hypercomplex and 
they can be solved only in the framework of the neural network approach. The formal- 
ized problems are the problems that can be represented in the form of the system of 
linear or nonlinear algebraic equations, or in the form of the system of ordinary non- 
linear differential equations, in terms of the system of equations in partial derivatives. 

The question concerning the class of problems that can be solved in the most effi- 
cient manner by different computing devices designed according to the new principles 
is always topical. It was considered for a long time that neurocomputers are efficient in 
the solution of the unformalized or ill-formalized problems that obligatorily include 
the algorithms with the learning procedure using real experimental data. 

The problem of approximation of the particular functions with the discrete domain 
of variation is one of the main problems of this type. This is the problem of pattern 
recognition. The unformalized problems are evidently an important argument for the 
use of neurocomputers. However, it is necessary to remember that the problem of pattern 
recognition is only a special case of function approximation. And not statistical (re- 
gression models) but rather flexible nonlinear (neural network) approximation meth- 
ods are used in this case. 

At the present time, a new class of problems with pronounced natural parallelism 
has emerged (signal processing, image processing). This class of problems does not 
require the learning procedure using the experimental data. However, it is well repre- 
sented in the neural network logical basis. 

It is also efficient to use the neural network algorithms for the problems with the 
input information space (input data) generated by the Monte Carlo method, rather 
then analytically. 

The question about the efficiency and necessity of the representation and solution 
of the problem class in the neural network logical basis is very important. 

The first problem class was the problem of pattern recognition. A lot of different algo- 
rithms and different architectures of computers were used for its solution in the 1960s. In 
the 1970s and 1980s, the neural network algorithms for its solution became dominant. 

The second problem class with the dominant use of the neural network algorithms 
is the problem of function approximation and extrapolation. At present, the main prob- 
lem consists in the methods of the neural network solution algorithm development in 
each particular case. 

The third problem class in which the advantages of the neural network algorithms 
are practically proofed is the problem of the dynamic system control or neural control. 
Two main tasks concerning the dynamic object identification (analysis) and construction 
of the correcting filters in the control loop (synthesis) that can be solved in the neural 
network logical basis make the use of the approximation methods unnecessary for the 
nonlinear differential equation solutions that are oriented on von Neumann computers. 
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The aforementioned problems can be conditionally divided into two groups: the 
first one is adequate to the neural network logical basis and the second one is “general”. 
The time required for the solution of the “general” problems of large dimensionality 
using von Neumann computers or transputer-like (claster) computers can exceed the 
admissible time. The achievement of the admissible time in this case can result in 
exceeding the capacity or the cost of the computer system. Then the necessity to de- 
velop the neural network algorithms and the neural network hardware emerges. 

The necessity of the solution of mathematical problems of high dimensionality 
appears as a rule in the case of the solution of practical tasks related to high technolo- 
gies in various scientific studies, industry and economics. Namely the widespread 
development and application of neurocomputers are an indication of the development 
of high technologies. 

The problems that cannot be solved by the computational facilities of the current 
development level were always observed in the history of computer engineering. Gen- 
erally, at the present time, the transfer to the neural network logical basis is used in the 
case of the sharp increase of the solution space dimensionality or in the case of the 
requirement of a sharp decrease of the solution time. 

The iteration algorithms are the natural solution under the condition of the prob- 
lem of high dimensionality. The known iteration solution algorithms in the neural 
network logical basis such as, for example, the algorithms for the solution of the sys- 
tems of linear algebraic equations, are often rather primitive, consisting of only one 
layer. This decreases the problem solving quality. The use of the neural networks with 
different structures including the neural networks with feedback coupling opens a broad 
perspective of development in such a field of neuromathematics. 

The increase of the class of problems solved in the neural network logical basis can 
be efficiently estimated by the ratio of the productivity rate to the cost as compared with 
the classical von Neumann computers. This estimation shows that neurocomputers are 
close to the class of general-purpose computers. 

It is assumed that the algorithms and programs will be efficiently used on any ex- 
istent and prospective neurocomputer and form the basis of the future mathematical 
program libraries for neurocomputers, i.e., the basis of the applied software for the 
prospective neurocomputers as general-purpose computers. The developed algorithm 
software will constitute the basis of neuromathematics. 

It must be mentioned that the neural network solution algorithms for different 
problems are often “similar” to each other. They have a canonical neural network struc- 
ture selected for some particular problem: the number of layers and the number of 
neurons in the layers, the neural network adjustment procedure. Therefore develop- 
ment engineers and users obtain the possibility for the objective quantitative compari- 
son of the different algorithms. 

We consider that any problem can be solved with the help of the neurocomputer 
much more effectively than with the usual computer due to the fact that any problem 
algorithm can be represented in the neural network logical basis with the controlled neural 
layer number and minimized number of iterations of the adjustment procedure. 

This means that the neural network algorithm for the solution of any problem on 
the logical level is much more parallel than any of its physical implementation. This 
property differentiates neural computers from such systems as transputer-like ones in 
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which the software designers usually modify the solution algorithms developed ini- 
tially for the single-processor computers. These modifications are aimed at minimiz- 
ing the expenses related to the information exchange between processors in the prob- 
lem solving process. 

According to the aforementioned remarks, it is necessary to comment on Fig. 17.1 
that represents the logical structure of the selection procedure for the problems ad- 
equate to the neural network logical basis. As it was mentioned, all the problems can 
be divided into two types as formalized and unformalized problems. The author con- 
siders that the unformalized problems can practically be solved only on the neural 
network logical basis. 

After the development of the neural network solution algorithm for the unformalized 
problem, its programming on the workstation computer, and analysis of the solution 
time dependence on the neural network parameters (in particular, such a parameter 
is the problem dimensionality), one can determine if the time required for the solution 
is sufficient for the customer. If this time is sufficient enough, then the neurocomputer 
implementation is the workstation computer program. In the opposite case, one can 
choose, in practice, only two possible decisions: 


1. The design of the hardware for the neural network problem solving accelerator based 
on some technology with dependence on the customer’s requirements concerning 
the duration of the development work and concerning the weight, size and cost of 
the hardware unit. 

In this case, for the particular selected technology of the neurochip and neuro- 
plate implementation, one can approximately calculate the number of these 
neurochips and neuro-plates in the hardware accelerator. Then the neuro-plate, the 
unit or the pillar of the neural network hardware accelerator with the host-com- 
puter, represents the neurocomputer’s implementation. 

2. In the case of the strict requirements concerning the duration of the development 
work and the absence of the requirements concerning the weight, size and cost of 
the hardware unit, the development of the program for the claster computer with 
parallelizing of the neural network algorithm using several processors can be done. 
The number of processors required for the problem implementation in this case can 
be approximately estimated because the use of the neural network algorithms al- 
lows for the control performed by the neural network algorithm. The algorithm can 
provide smoothness of the processors’ loading and minimize the expenses related 
to the information exchange between processors. Then the neurocomputer imple- 
mentation is the program for the claster computer realizing the parallel neural 
network algorithm. 


Computational mathematics deals with the solution of formalized problems. And if 
the customer is satisfied with the operation speed that provides the solution algorithm 
in the classical logical basis adequate to the von Neumann architecture then there is no 
necessity in the use of the neural network logical basis. The development of advanced 
technology and the complexity of the formalized problems due to the increase of the 
dimensionality often result in the unsatisfactory time required for the solution on the 
workstation computers with the use of the classical algorithms. 
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Then, as it was mentioned above, the designer has only two possible decisions: 


1. To develop the neural network algorithm for the formalized problem and then use 
the procedure described above for the unformalized problems, see [17-4], and ref- 
erences in that study, see also [17-6] and references in that study given in the section 
“Neuromathematics;” 

2. To develop the program for the claster computer with parallelizing of the classic 
algorithm. This method is used by the majority of the claster computer users. How- 
ever, one must take into account that the designer is solving 
- Either a purely scientific problem without constraints on the weight, size and 

cost of the computer, i.e. using the claster computer that is available to him; 

- Or the practical problem with significant constraints on the weight, size and cost 
of the computer. Then the designer is often forced to use the neural network 
logical basis and to develop the neurocomputer, i.e., to elaborate the neural net- 
work solution algorithm. 


Notice that even in the case of the solution of purely scientific problems without 
constraints on the weight, size and cost of the computer, it is sometimes necessary to 
use the neural network logical basis for the solution of the problem of optimization of 
the loading distribution between processors of the claster computers. Therefore, Fig. 17.1 
shows a set of possibilities for the neurocomputer implementation with the reasonable 
selection of the problems adequate to the neural network logical basis. 
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17.4 
The General Structure of the Program Package for Problem Solution 
in the Neural Network Logical Basis 


The foundation of the unified method for the problem solution in the neural network 
logical basis is the method of the adaptation algorithm synthesis for the multilayer neural 
networks. According to this method, the following adjustment algorithms for the mul- 
tilayer neural networks were developed: 


Neural networks for the general performance mode (learning, self-learning, learning 
with the supervisor of the finite qualification, etc.); 


Neural networks for the wide class of the primary optimization criteria (minimum of the 
average risk function, average risk function under the constraints on its components, 
maximum of the a priory probability, maximum of the a-posteriori probability, etc.); 


Neural networks for the wide class of the secondary optimization functionals (gradient, 
gradient with memory, combination of the gradient procedure with the random search 
for the initial condition selection, etc.); 


Neural networks for the different multilayer neural network structures (with the arbi- 
trary number of neuron layers, with complete sequential, cross and feedback connec- 
tions, etc.). 


The following principles form the basis of the neural network solution algorithm 
development: 


" The refusal from the known neuro-packages of neural network programs and para- 
digms; 

" The synthesis of the neural network algorithms adequate to each particular math- 
ematical problem; 

" The synthesis of the neural network algorithms and structures of the tuned neural 
networks without intrusion from the stated problem but with flexible and desired 
structure selection aimed at the improvement of the problem solving quality. 


We consider the problem solving quality as the precision of the solution and the 
operation speed determined, in particular, by the number of iterations in the adapta- 
tion procedure of the neural network. 

The general methods of the mathematical problem solution in the neural network logical 
basis were described in [17-2]. The neural network solution algorithms are represented in 
this study in the whole structure defined by the methods of synthesis of the multilayer 
neural networks that include the following stages of the problem statement: 


= Physical, geometrical; 
= Mathematical; 
= Neural network. 
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The neural network problem statement includes in turn the following stages: 


= Description of the initial data; 

= Determination of the input signal x(n) of the neural network; 

" Generation of the primary optimization functional of the neural network for the 
solution of the problem; 

" Determination of the output signal y() of the neural network; 

= Determination of the desired output signal of the neural network; 

= Determination of the neural network error signal vector for the solution of the problem; 

" Generation of the neural network secondary optimization functional through the sig- 
nals in the system; 

* Selection of the secondary optimization functional extremum search method; 

* Analytical determination of the transformation performed by the neural network; 

= Selection of the particular structure of the neural network; 

" The search of the analytical expression for the gradient of the secondary optimiza- 
tion functional through the adjustment parameter; 

" Generation of the neural network adjustment algorithm for the solution; 

= Selection of the initial conditions for the neural network adjustment; 

" Selection of the typical input signals for the verification of the solution procedure 
for the problem; 

= Development of the plan of experiments. 


The aforementioned stages of the neural network solution algorithm synthesis deter- 
mine the complete circuit diagram of the user work with the program package (Fig. 17.2). 

Figure 17.2 (see p. 372/373) represents the current version of the general structure of 
the program package for the solution in the neural network logical basis. This structure 
is the pathway for the development of the neural network solution algorithm and can 
serve as the basis for the design of the menu for the considered program package. 

After the designer’s decision to use namely the neural network solution algorithm, 
he can use the two following types of neural networks: 


= Neural network with flexible (variable) structure [17-1]; 
= Neural network with the fixed structure. 


The author believes that there is no third possibility at the present time. 

It must be mentioned that in the Russian school of the neural network solution 
algorithms, the process of the solution for both possibilities was considered as some 
dynamical process with the use of some significantly nonlinear neural network envi- 
ronment. This approach was formed on the basis of the general theory of adaptive 
search and analytical systems. 


17.5 
Multilayer Neural Networks with Flexible Structure 


The main advantage of the multilayer neural networks with flexible structure is the 
absence of the obligatory a priori information about the neural network structure (the 
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number of layers and the number of neurons in the layers). This structure is formed 
in the process of the neural network adjustment, i.e., in the process of the solution. The 
obtained structure indirectly reflects the problem complexity. The more complex the 
trained neural network is (the number of layers and the number of neurons in the 
layers), the more complex the solution is. 

The neural networks with flexible structure [17-1], [17-2], and [17-3] can be effi- 
ciently used for different types of the input feature space: 


= Binary, when the variables of the input N-dimensional vectors are represented by 
the set of zeros and units; 

= K-digital variables of the N-dimensional vector; 

= Real-valued variables of the N-dimensional vector. 


The multilayer neural networks with flexible structure were mostly used for the 
case of the N-dimensional space of the real-valued feature when the input information 
represents the continuum signals from some fixed time interval. 

The limitation of the multilayer neural networks with flexible structure is the fact 
that they can be used only for the solution of two problem classes: 


= Recognition of two classes of patterns; 

= Recognition of K classes of patterns (with generation of kK neural networks, rec- 
ognizing each «-th (k= 1, ..., K) class from the other one); 

" Self-learning (clasterization), when the input sample presented for the clasterization 
without the supervisor instruction about belonging to the different classes is the 
sample of the first class for the multilayer neural network with flexible structure, 
whereas the output of the white noise generator is the sample of the second class. 


It is evident that the number of problems solved with the help of the multilayer 
neural networks with flexible structure will increase in the future. 

In the procedure of the adjustment of the multilayer neural network with flexible 
structure, the first layer is trained at the beginning of the procedure when the required 
number of neurons H, of this layer is determined. Thereafter, the results of the first 
layer adjustment are used for the adjustment of the second and third layers. The num- 
ber of layers in the solution of the two-pattern recognition equals 2 or 3 and only the 
single neuron is at the output. 

Then the stages of development of tests for verification of the trained neural net- 
work quality are executed. Thereafter, the plan of experiments is elaborated for the 
investigation of the quality of the neural network performance. These stages are com- 
mon for the multilayer neural networks with flexible structure, and they will be con- 
sidered below after the consideration of the neural networks with fixed structure. 

At the end of the section dedicated to the multilayer neural networks with flexible 
structure, one must notice that at the present time, these neural networks are used for 
the solution of the problems of a relatively narrow class including the recognition of 
patterns of two or K classes and clasterization (self-learning). There is a potential 
probability of using such neural networks for the solution of wider classes of the prob- 
lems (function approximation and extrapolation, etc.). 


352 


Chapter 17 - Methods of Problem Solving in the Neural Network Logical Basis 


17.6 
Neural Network with Fixed Structure 


According to author’s opinion, in contrast to the neural networks with flexible struc- 
ture, the neural networks with the fixed structure can be used for the solution of any 
problems in the case that they satisfy the aforementioned selection criterion No. 3. The 
restriction on the neural network structure selected a priori is the payment for this 
universalism in the problems that can be solved. The neural network structure in this 
case is one of the components of the vector that includes all the types of the a priori 
information that are required for the neural network solution algorithm development. 
The complete description of this vector will be given in the conclusion of the present 
study. The stages of the neural network solution algorithm development with the help of 
the neural networks with fixed structure are given below. These stages are the following: 


. Generation of the input signal including the formation of the supervisor instructions; 

. Generation of the output signal; 

. Formation of the primary optimization functional; 

Generation of the open neural network structure; 

. Formation of the secondary optimization functional; 

. Formation of the search algorithm for the secondary optimization functional extremum; 

. Formation of the algorithm for the adaptation of the coefficients of the multilayer 
neural network with fixed structure; 

8. Development of tests for the verification of the performance quality of the trained 

neural network; 
9. Elaboration of the plan of experiments for the verification of the performance quality 
of the trained neural network. 


NDAU PWN EE 


The last two items in this list are common for the neural networks of flexible as well 
as fixed structures. 


17.6.1 
Generation of the Input Signal of the Neural Network 


This problem is not trivial and sometimes it is not single-valued but has several solu- 
tions. It can be relatively simply formulated in the pattern recognition tasks where the 
patterns are already represented by the vectors of features. However, in the particular 
problems of signals or pattern recognition, the generation of these patterns is a rather 
complicated problem. This problem is complicated even in such a transparent task as 
the function extrapolation because of the introduction of the additional parameter 
(filter memory) and the special method for the further generation of the supervisor 
instructions for the neural network. Similar problems exist in the tasks of the neural 
network equalizer development, systems of neuron control, etc. 

The input signal of the neural network is the signal [x(n), €(m)], where x(n) is the 
series of the input patterns, €(m) is the supervisor instruction about the patterns x(n) 
belonging to a particular class. Thus, both x(n) as well as €(n) can be represented in 
different ways with dependence on the particular problem statement. The series x(1) 
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can be the vector of the real-valued variables, the function of some argument, some 
vector-function, etc.; the series E(n) can be the real-valued variable that takes two, K 
or a continuum of values, or it can be the vector or some vector function. 

The probabilistic approach to the perception of the outside makes it necessary to 
generate the input signal for the neural network in the form of the joint distribution 
function f(x, €). The detailed form of this function in the different particular cases can 
be rather different. In the majority of the investigations of the neural network learning 
algorithms, it is assumed that the supervisor qualification is complete, i.e., the super- 
visor can exactly determine the belonging of the particular pattern to a given class. 


17.6.1.1 
About the Supervisor Qualification 


However, in practice, the problems with the limited supervisor qualification also exist 
and they are not sufficiently investigated. 

The elaboration of the neural network algorithms adequate to the real conditions of 
getting information for their adjustment requires an estimation of real supervisor 
qualification. Due to this requirement, along with the widespread learning modes of 
the multilayer neural network in which it is assumed that the supervisor is aware about 
the patterns’ belonging to a particular class with unit probability, one must consider in 
more detail three more learning modes introduced in [17-2]: 


= Learning with the supervisor having zero qualification (self-learning, clasterization); 

= Learning with the supervisor having finite qualification; 

= Learning with the supervisor having negative qualification (the “harm” mode in 
which the supervisor wittingly gives false information about the pattern belonging 
to a particular class). 


Neural networks in the self-learning mode. Clasterization. In spite of the long-term his- 
tory of this problem, it remains still poorly investigated. The main task here consists in 
the processing of the set of multi-dimensional vectors aimed at the selection, according 
to a certain rule, of compact vectors’ groups termed clasters. The investigations in this 
domain were not activated during last decades due to the absence of the socio-significant 
problems in which the self-learning mode would play an important role. But today such 
problems have begun to appear. From our viewpoint, the most significant one is the prob- 
lem of information compression (compression of images, speech information, etc.) the 
solution of which by the existent classical methods have achieved the real limits of its 
capacity. In the scope of this problem, the aforementioned tasks that accompany the prob- 
lem must also be further developed. Namely these tasks are the following: 


" The typical input signals; 
" The initial adjustment conditions; 
" The control over the iteration procedure parameters, etc. 


Neural networks in the mode of learning with supervisor having finite qualification. 
Since the 1970s, when designing some specific systems of electrocardiogram recogni- 
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tion, it has been noted that in the process of electrocardiogram archive verification, an 
expert physician or a group of expert physicians cannot relate some electrocardio- 
grams to a particular class of diseases with full reliability. The investigation of the 
dynamics and of the results of the multilayer neural network adjustment as a function 
of the real idea about supervisor qualification is an important subject for future inves- 
tigations in the field of neural network theory. 


Neural networks in the “harm” mode of learning. The neural network operation mode 
in which the supervisor wittingly gives false information about the pattern belonging 
to a certain class is completely unstudied. Probably this mode will be used under war 
information conditions in order to estimate the information safety of the corporate 
systems and regions with the help of multilayer neural networks in the case of infor- 
mation about weapon utilization. 

Notice that in the known scientific literature, two different operating modes of the 
neural networks, namely the learning mode (the supervisor qualification is complete) 
and the self-learning mode (the supervisor qualification is zero), are regarded inde- 
pendently from each other. In the proposed methods, these two modes differ only by 
the value of some parameter. The variation of this parameter allows for the consider- 
ation of a lot of new modes. 


17.6.1.2 
Taking into Account A Priori Probabilities of the Classes’ Emergence 


The necessity of taking into account the a priori probabilities of the classes’ emergence 
appears in different practical tasks. A typical example is the task of letter recognition 
in the printed text of the scanned document in the case when the probability of any 
letter appearance is known. The possibility to use the a priori probabilities of the classes’ 
emergence during the adjustment of the multilayer systems for image recognition is 
investigated in the study [17-2]. This possibility was permanently used in that study 
for the construction of the different particular systems. However, this technique re- 
quires additional investigation in order to use it efficiently. 


17.6.1.3 
Continuum of Classes 


The general representation of the input signal for the multilayer neural networks of 
two, K and continuum pattern classes in the learning mode with the limited supervi- 
sor qualification allows for the use of the neural networks in the problem of quantita- 
tive estimation of the object state described by the signal x(). In this case, the series 
&(n) is real-valued and varies in some limited interval. 


17.6.1.4 
About the Nonstationarity of the Input Signal 


In the majority of the practical problems, the neural network input signal is consid- 
ered to be stationary with some unknown and complex distribution function f(x, é). 
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However, problems with the non-stationary input signal f(x, €) exist, and it must in- 
fluence the characteristics of the multilayer neural network adjustment algorithms. In 
the learning mode, the pattern distribution functions of each class are time depen- 
dent. In the self-learning mode (clasterization), both coordinates of the classes’ centers 
as well as their characteristics can be time dependent. 


17.6.2 
The Multilayer Neural Network Output Signal Generation 


The output signal of the neural network is formed according to the type of solved 
problem. This signal can be the binary value (or the vector of the binary values), 
K-digit value (or the vector of K-digit values) and real-valued variable (or the vector 
of the real-valued variables). The number of neurons in the output layer of the neural 
network and the form of the activation function of the neurons in the output layer are 
determined according to the output signal type. In the particular case, the output sig- 
nal represents some spatial argument. The output layer in this case represents the neuron 
continuum with the real-valued output signals rather than the discrete set. 


17.6.3 
Formation of the Primary Optimization Criteria 


The basis of the multilayer neural network primary optimization criteria includes the 
following items: 


" The assumed probability concept of the external world; 
" The consideration of the external world as a significantly nonlinear one. 


Namely this basis allows for the formation of the primary optimization criterion as 
the main goal that the designer wants to achieve in the development of the multilayer 
neural network with the adaptation algorithm for the particular solution. The prob- 
ability criteria described below are valid for the relatively wide class of problems, but 
this class can be further enlarged. 

The recently developed methods for the multilayer neural network adaptation algo- 
rithm synthesis can be used for the following primary optimization criteria: 


=" minR=p,r,+ por, of the average risk function; 

= minp,r, under p,r,=const.; 

= minR for K and continuum of classes; 

= The aforementioned variants for two K, and continuum of classes; 

= Different modifications of the aforementioned criteria, for example, the criterion of 
maximum of the a-posteriori probability. 


All these criteria can be used for the solution of the particular practical problems. One 
must consider the aforementioned criteria for the neural network synthesis in the proce- 
dure of the formation of the error-cost matrix. This matrix is used to make the decision that 
the pattern of one class belongs to another class (the error function in the continuum case). 
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As a rule, in the known studies, the error-cost matrices that emerge in the course of 
the assignment of the particular pattern to a particular class are assumed to be diago- 
nal. However, it is not often in agreement with reality. For example, in the case of the 
neurocomputer design for the mine recognition system with the use of the geolocator, 
in the matrix of costs for errors 


hysl 
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the coefficients 1, (the cost for the error to consider the mine as an irrelevant object) 
and 1,, (the cost for the error to consider the irrelevant object as a mine) cannot be 
equal in principle, and it must be taken into account during the adjustment of the 
multilayer neural network in a similar way to how it was carried out in the studies 
[17-2, 17-3]. 

The often used criterion of the minimum of the mean-root square error is the primi- 
tive and particular case of the aforementioned criteria. 


17.6.4 
Selection of the Open Neural Network Structure. 


The a priori information about the neural network structure that is used in this stage 
of the neural network synthesis is the payment for the solution universalism. It is 
necessary to mention two main classes of the neural network structure that are used 
at the present time for the solution: 


= Neural networks with complete sequential connections; 
* Neural networks with complete feedback connections. 


The selection of the neural network structure results in the following: 


= The selection of the number of neuron layers; 

" The selection of the number of neurons in all the layers except the last one (the 
number of neurons in the last layer is selected in the stage of the generation of the 
neural network output signal). 

" The selection of the activation function in all the neuron layers except the last one 
(the activation function for the last layer is also selected in the stage of the genera- 
tion of the neural network output signal). 


17.6.5 
Remarks about the Selection of the Open Neural Network Structure 
that is Adequate to the Class of Solution Tasks 


In the majority of the scientific literature, the structure of the open neural network is 
introduced by the authors without any explanation. The main idea of the Russian works 
in this field is the development of the multilayer neural network adjustment algorithms 
adequate to the particular solution. 
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If the class of solution tasks allows one to define the class of effective neural net- 
work structures adequate to these tasks, then the elaboration of the special methods 
for the coefficient adjustment namely for this class of the neural network structures 
will increase the adaptation effectiveness for the task solution of this specific class. We 
describe below some variants of the neural network structures and the task classes 
adequate to them. 


Neural networks with random connections. In his classical monograph, Rosenblat sug- 
gested introducing the random connections between the retina and the first layer of 
the multilayer neural network. Under the proper increase in the number of neurons, 
the systems’ reliability, related to the possible break-down of several neurons, increases. 
At present and in perspective of the development of microelectronic technology, the 
number of emulated neurons inside the super-large-scale integration on the board and 
in the unit is quite large. And this number will continue to increase in the future. This 
makes the variant of the random connections more and more necessary for implemen- 
tation and research. 


Neural networks with lateral connections. This specific type of connections between 
layers in the multilayer neural network is interesting from the viewpoint of imple- 
mentation of the invariance to the transformation group and has been poorly inves- 
tigated. This is not only the invariance to the simplest affinity transformations, such 
as rotation, transition and the change of the affinity ratio, but also the invariance to 
the more complex transformations and the search of the connections’ structures 
ensuring such invariance. 


Cell-like neural networks. Cell-like neural networks are networks with a special topo- 
logical structure that is adequate, in particular, to the task of pattern processing. In this 
case, the natural task parallelism results in the natural parallelism in the structure 
organization of the processing neural network. It is necessary to note that the cell-like 
neural networks are adequate to the other tasks with natural parallelism; for example, 
to the task of lattice generation and other tasks emerging at the solution of two-dimen- 
sional differential equations in partial derivatives. When changing from 2D- to 3D-tasks, 
the similar three-dimensional cell-like neural networks will be adequate to the tasks of 
three-dimensional pattern processing, 3D-lattice generation, virtual reality, and the 
solution of three-dimensional differential equations in partial derivatives. 


Neural networks with feedback loops. The conception of the neural network with feed- 
back loops in its classical sense was introduced in [17-2]. In this case, the feedback 
channels are present in the structure of the adjusted multilayer neural network. These 
channels are used for the transmission of the intermediate and output signals of the 
neural network to the input channels of the previous layers through the delay lines 
(for a given number of cycles). In the past, in the 1960s-1970s, it was considered that 
such neural network structures could be used only for the design of the memory units 
of a different functional destination. In the last ten to fifteen years, in foreign litera- 
ture, the investigations of such neural networks, conventionally called recurrent ones, 
were sharply activated. Moreover, the range of their application increased including 
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the tasks of function approximation and extrapolation and the systems’ dynamic 
control tasks. The neural networks with feedback (recurrent neural networks) are the 
natural control devices and nonlinear controlled object identification devices in the 
nonlinear systems of control. This is similar to the case of the linear systems of con- 
trol in which Z-filter is the linear control device and Z-transformation is the formal 
description of the controlled object. The neural network with feedback channels is a 
typical example of how the structure of the adjusted multilayer neural network is 
selected from the criterion of adequacy to the solution task but not from the simple 
fact that an author is familiar with this or that neural network structure. In case of a 
neural network with feedback channels, such additional problems for investigations 
emerge as the determination of the degree of an equivalent decrease in the number 
of neurons in the adjusted multilayer neural network after introducing the feedback 
into its structure. 


Neural networks with variable (flexible) structure. Since the 1960s, the multilayer neu- 
ral networks with variable structure have been an effective tool in solving the task of 
pattern recognition [17-1]. This is the variant of the adjustment algorithms whose neural 
network structure (the number of neurons in the layers and the number of layers) 
grows in the process of adjustment up to a certain value of the solution quality index. 
The synthesis of the adjustment algorithms for the multilayer neural network with 
variable structure is a promising technique to solve a wide range of practical tasks. 


Continual neural networks. Continual neural networks [17-3] are used mainly in two cases: 


= When the number of indications in the layer is large; 
= When the signal or pattern processing is performed on a real time basis and with- 
out preliminary quantification of the input information. 


It is shown in [17-2] that the neural network adjustment algorithms under the con- 
tinuum of indications or under the continuum of the neurons in the layer are the objects 
of independent consideration and research. 


Complex neural networks. The input signals and the weighting coefficients in the neu- 
ral networks of this type are represented in the form of complex numbers, and all the 
operations in the open neural networks and in the adjustment algorithms include the 
complex numbers. This type of neural network is widely used for nonlinear signal 
processing. 


Interval neural networks. In this case, the input signals are determined not by their 
values but rather by the interval to which they belong. 


17.6.6 
Remarks about the Activation Function Selection 


The activation function selection is an important element of the neural network syn- 
thesis procedure. More than ten types of the neuron activation function are described 
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in the known literature. Usually their selection is performed arbitrarily. In the studies 
[17-1, 17-2] the activation function (2/m)arctg Bg was used, where g - is the analogous 
output signal of the neuron. At the end of the 1980s to the beginning of the 1990s, the 
sigmoid activation function became widely used. Wavelet and RBF networks are in 
fact the neural network with the activation function of a particular type. 

As a rule, the introduction of the new peculiar type of the activation function rep- 
resents an attempt to make the neural network be more adequate to the solution task 
in order to decrease the number of neurons and adjusted coefficients. However, the 
desired goal is not always achieved because the task of the neural network adjustment 
simplification is also desired, in addition to the task of the simplification for the cal- 
culations of the output signal from the input signal. It is necessary to note that the 
complication of the activation function results in the sufficiently sharp complication of 
the adjustment algorithm due to the fact that the computational units for the activa- 
tion function derivatives’ calculations are used in the adjustment algorithm. 

On the whole, at present, the problem of selection of the neurons’ activation func- 
tion in the multilayer neural network is far from its solution. 


17.6.7 
Selection of the Multilayer Neural Network Structure According to its Hardware 
Implementation Technology 


Some types of the open neural network structures are used due to the constraints of 
the neural network hardware implementation technology. The following neural net- 
works belong to this class: 


= The neural networks with cross-connections (from the i-th to the i + 2,... layers) for 
the decrease of the number of the realized neurons with some increase of the num- 
ber of weight coefficients [17-2, 17-3]; 

= The neural networks that realize the feature continuum, the continuum of the neu- 
ron number in the layer, etc., for the implementation of the analog-to-digital 
neurocomputers and the signals and pattern processing [17-3]; 

" The neural networks with the weight coefficients of the finite digit capacity or with 
the adaptation algorithms for the weight coefficient digit capacity control. 


The problem of the weight coefficient digit capacity for processing signals and 
patterns of different digit capacity representation is independent and very important. 
The low digit capacity of the weight coefficients results in additional errors, and the 
high digit capacity results in the high cost of the system and long processing time. 
This problem was correctly solved in the implementation of the linear z-filter 
hardware using very large-scale IC IMS A100 of the Inmos firm in 1986. These very 
large-scale IC z-filters have the digit capacity that can be programmatically changed 
in the range 2... 16 with the corresponding decrease of the processing speed. In 
nonlinear filtering, this digit capacity can be changed according to some complex 
criterion. The problem of the adaptation algorithm development for the digit capac- 
ity control using very large-scale IC is the problem of future investigation. It emerges 
in the following domains: 
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" The neural networks with the Boolean values of the weight coefficients for the sim- 
plification of the implementation of the open neural network. This simplification 
increases the number of neurons in the solution of the given quality; 

= The neural networks without multiplier units because the multiplier units represent 
the main difficulty of the hardware implementation; 

" The neural networks with the limitations on the weight coefficients that must be 
considered during the formal description of the open neural network structure and 
during the adjustment algorithm development [17-2, 17-3]. 


The introduction of constraints on the open neural network structure is a very signifi- 
cant problem in the analogous, analog-to-digital, optical, molecular and quantum 
neurocomputers, as well as in the development of neurocomputers based on the single- 
electron nanocircuits. 


17.6.8 
Generation of the Secondary Optimization Functional in the Multilayer 
Neural Networks 


In general, the primary optimization functional describes the neural network optimi- 
zation criterion implemented in the hardware level. In contrast, the secondary optimi- 
zation functional must be defined through the input and intermediate signals in the 
multilayer neural network and through the formal description of the open neural 
network structure. In the simple case, the goal of the secondary optimization func- 
tional development is the development of the analytical transformation at the neural 
network output that provides the second distribution moment of the signal correspond- 
ing or equal to the primary optimization functional [17-2, 17-3]. 


17.6.9 
Generation of the Algorithm of the Search Procedure 
for the Secondary Optimization Functional Extremum 


During the period of the 1960s and the beginning of the 1970s, neural networks were 
considered as the particular case of the nonlinear multi-dimensional object with the 
adjustable parameters implementing the adaptation self-learning control system. 

In that period, the adaptation self-learning systems were developed mainly in the 
following two domains: 


" The search systems with artificial search oscillations of the adjustment parameters 
for the following calculation of the optimization functional gradient; 

" The analytical systems without artificial search oscillations of the adjustment param- 
eters. The optimization functional gradient was calculated directly through the cur- 
rent input and output neural network signals as in the particular object of control. 


The analytical methods of self-learning were developed mainly in Russian investi- 
gations, though in the theory and practice of the self-learning automated control, the 
search methods were dominant. 
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The following four variants of the secondary optimization functional gradient search 
were used before 1974: 


= Gradient method; 

" Gradient method with the time averaging of the gradient value estimation; 

= Gradient method with the constraints on the neural network weight coefficients; 

= Combination of the random selection of the search initial conditions and the gra- 
dient procedure for the search and analysis of the local extremum values. 


The experience showed the high efficiency of the analytical search methods when 
the estimation of the secondary optimization functional gradient is performed by the 
current output and intermediate signals. 

The main problems considered in the field of the secondary optimization functional 
extremum search are related to the fact that this functional is multi-extremal and it 
exists in a rather multi-dimensional space of the neural network adjustment coeffi- 
cients. 

This is the reason that the modern neural network methods of the secondary opti- 
mization functional extremum search require the development in different directions, 
and some of them are described below. 


17.6.9.1 
The Control of Parameters in the Extremum Search Procedure 
for the Multi-Extremum Secondary Optimization Functional 


The gradient local extremum search procedure for the multi-extremum optimization 
functional is a very important element of the multilayer neural network adjustment 
algorithm. In the simplest case, the weight coefficient K" at the functional gradient is 
determined in the empirical way in the process of solution of each specific task and is 
left constant in the adjustment procedure. Since the 1960s, researches have tried to 
make coefficient K" be variable (decreasing) with time during the adjustment proce- 
dure. This was done in order to decrease the adjustment error in the steady state. But 
this led to the significantly sharp increase in the adjustment duration (in the transient 
process of the multilayer neural network adjustment). At present, a considerable part 
of this problem remains open. Some efforts are made to control the value of the 
coefficient K” by the current error value and by the gradient functional. 


17.6.9.2 
The Modifications of the Global Extremum Search Algorithms for the Multi-Extremal 
Secondary Optimization Functional 


The secondary optimization functional in the multilayer neural network is multi- 
extremal by definition. The reasons for this are the following: 


= The input signal is rather complex (for example, the distribution of patterns aggre- 
gate in the multi-dimensional space of indications at the patterns’ recognition task 
solution is multi-modal); 
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= The variants of the task solution are multivariate; 
= The open neural network structure is flexible. 


The search methods for the global extremum (or, in addition, for several local ex- 
trema) are at present only in the stage of formation. 

One of such methods related to the set of ejections of the random initial conditions 
in the space of the neural network weight coefficients and to the search of the global 
extremum for the secondary optimization functional was presented and investigated 
in [17-2]. The convergence of the presented procedure by the number of initial condi- 
tions under the fixed number of local extrema was shown there. Some methods of this 
procedure resulting in the time decrease for the global extremum search are known. 
The method of “annealing” is an example. 

The method of “annealing” can be used in the adjustment process in the following 
way. The independent variables (the neuron weights in the case of the neural network) 
undergo random changes. The values of the minimized neural network’s functional 
are stored for each changed set of variables, and then the best set is selected. A rela- 
tively large range of the random-value generator that changes the neuron weights is 
taken at the beginning of the process. The set of the variables’ values (weights) corre- 
sponding to the best functional value is then selected after several changes. And this 
set of variables is then taken as the initial one for the following procedure of random 
changes but with the decreased range of the random-value generator. 

The gradient algorithm is effective in finding the local minimums in the case of 
neural network weight adjustment during the learning procedure. In general, the mix- 
ture of the “annealing” and gradient methods is the most effective algorithm. First, the 
“annealing” algorithm is used to find the initial weights. The gradient descent algo- 
rithm is used thereafter to bring the system to the nearest local minimum. Then the 
“annealing” algorithm is used at this point again in order to leave this local minimum. 
These stages are repeated until one has the possibility to leave the recurrent local 
minimum. In the latter case, it can be considered that the global minimum is obtained. 


17.6.9.3 
Filtering and Extrapolation of the Signal corresponding to the Estimation of the 
Secondary Optimization Functional Gradient 


As a rule, the decision to change the weight coefficients in the known neural network 
adaptation algorithms is taken in each operation cycle according to the results of one 
single pattern passing through the network. The experience of using the filter with the 
memory m,,# 1 [17-2] in the adjustment circuit showed the increase in the adjustment 
effectiveness for stationary and non-stationary patterns at the multilayer neural net- 
work input. Some attempts to speed up the learning process through the application of 
the weight coefficients’ extrapolation procedure during the neural network adjustment 
are known from the literature. The filter synthesis in the weight coefficients’ adjust- 
ment circuit is poorly investigated though it is a perspective algorithm in the general 
procedure of the multilayer neural network synthesis. 

The following parameters must be determined in the process of the multilayer neural 
network adjustment algorithm: 
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= The filter memory; 
" The filter type according to the a priori selected form of the neural network input 
signal nonstationarity. 


17.6.9.4 
The Multilayer Neural Network Adaptation Algorithms with the Adjustment 
of the Coefficients for the “Slope” of the Activation Function 


When using the activation function with a variable “slope,” the separate neuron in the 
neural network is described by the expression 
Z ~ a ~ 
y =—arctg BY \a;x; = ~arctg) \(Ba; )x; 
. i=0 i=0 
It is seen therefore that there is no sense in organizing the adjustment circuit for the 
coefficients B and a; at the construction of adaptation algorithms. In the neural network 
consisting of the neuron set that contains a separate neuron subset or all the sets with the 
activation function of the same “slope,” the organization of the adjustment circuit for the 
coefficient B, as well as for the separate coefficients, is necessary in order to decrease the 
whole adjustment duration, i.e., to decrease the time required for the task solution. 


17.6.9.5 
About the Use of the Second Derivative of the Secondary Optimization Functional 


The multilayer neural network adjustment algorithms with the use of the second de- 
rivative of the secondary optimization functional were developed at the end of the 
1960s. The works of the 1990s did not contribute significantly in this field. However, at 
the beginning of the 1970s, the experiments using the second derivatives showed that 
the noise level in this case is very high and the use of the second derivative becomes 
inefficient. At present, this situation is the same. 


17.6.9.6 
Selection of the Initial Conditions for the Gradient Procedure of the Extremum 
Search of the Secondary Optimization Functional 


The choice of the initial weight coefficients of the adaptive neural network is an im- 
portant condition to speed up the task solution procedure. Therefore, from our point 
of view, the widespread approach to choose the zero values for the weight coefficients 
or the random values with the uniform distribution in the given turn-down as the 
initial conditions is incorrect. 

Even during the solution of the problem of recognition of two pattern classes, it was 
clear that the initial weight coefficient values must be selected by generating the dividing 
surface configuration, implemented by the neural network, in the form of a multidimen- 
sional “chess-board” with the uniform distribution of “black” and “white” squares. Each 
color corresponds to the first and second classes of patterns in the physically implemented 
multidimensional space of indications [17-2, 17-3].Such a multidimensional “chess-board” 
is formed by the hyperplanes corresponding to the neurons of the first layer. 
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The initial weight coefficients of the adaptive neural network can also be chosen as 
the weight coefficients of the neural network with variable structure after its learning 
stage termination. As it was mentioned above, this is possible only for the pattern rec- 
ognition solution and for clasterization (self-learning) with the developed adjustment 
algorithms for the multilayer neural networks with flexible structure. 

The problem of function approximation (extrapolation) is an example of the effective 
solution of the problem of the initial weight coefficients’ selection for the multilayer neural 
network adjustment. In this case, assuming that the neural network is an effective nonlinear 
filter (extrapolator), it is expedient to choose the weight coefficients of the neural network 
implementing the equivalent nonlinear filter or extrapolator as the initial conditions. 

Hence, the choice of the initial conditions for the multilayer neural network adjust- 
ment possesses the following properties: 


" It is specific for each specific task that is solved by the neural network; 

" It is aimed at the acceleration of the adjustment process (and therefore at the accel- 
eration of the task solution) by putting the neural network into the domain of the 
global extremum of the secondary optimization functional; 

* Asa result, it allows one to increase the equivalent ratio between productivity and 
the cost during the specific task solution. 


The examples of the particular solutions for the problem of the initial condition 
choice are given in [17-7]. 


17.6.10 
Formation of the Adaptation Algorithms in the Multilayer Neural Networks 


The base for the formation of the adjustment (adaptation) algorithms in the multi- 
layer neural networks includes 


" The analytical expression for the secondary optimization functional and the analytical 
expression for its first derivative or the estimation of the sign of the first derivative; 

" The analytical expression for the secondary optimization functional extremum 
search with the use of its first derivative expressed through the input and output 
signals of the adjusted multilayer neural network. 


17.7 
Verification of the Adjusted Multilayer Neural Network 


The elaboration of the special test system for the neural networks with different structures 
is an important element for the increase in the reliability of prospective neurocomputers. 
The development of such tests for the adjusted neural network is a branch of the perspec- 
tive investigations in the neural network theory domain. This section relates both to the 
neural networks with variable structure as well as to those with fixed structure. 

The elaboration of the typical neural network input signal classes is necessary for 
the objective test of the adaptive neural network performance quality. Furthermore, 
the system of tests is always specific for the specific solution task. 
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The class of signals with Laplace transformation 
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is a typical example of the neural network input signal class. In this case, the test of the 
control system performance quality is carried out by the feed of the corresponding 
typical signals (6-function, unit step, linear signal, etc.) to the system input with the 
following analysis of the transient process and of the error in the terminal steady state 
(the order of the control system astaticism). 

The typical neural network input signal class must always possess some parameter 
characterizing the complexity of the solution task. This parameter is apparent in the 
aforementioned example. As far back as the beginning of the 1960s, for the pattern 
recognition tasks oriented onto the performance with the random samples of complex 
unknown multi-modal distributions, the random samples of multi-modal distributions 
were suggested as typical neural network input signals. Hence, the distribution modes 
represented a normal distribution and the mode centers were situated along the 
hyperbisector of the multidimensional space of indications alternating for each class 
[17-2, 17-3]. Two parameters were taken as the solution task complexity indexes, namely 
the number of the distribution modes and the variance of each separate mode. The 
indications of the modes belonging to the specific class in the case of a self-learning 
neural network performance mode were absent. 

The selection of the typical neural network input signal class is an important task for 
the researcher who desires to demonstrate, more or less objectively, the advantages of his 
neural network algorithm elaborated for the solution of the specific formulated task. The 
typical neural network input signal class must always possess some parameter characteriz- 
ing the complexity of the solution task. This parameter is apparent in the aforementioned 
example. As far back as the beginning of the 1960s, for the pattern recognition tasks ori- 
ented onto the performance with the random samples of complex unknown multi-modal 
distributions, the random samples of multi-modal distributions were suggested as typical 
neural network input signals. Hence the distribution modes represented a normal distri- 
bution, and the mode centers were situated along the hyperbisector of the multidimen- 
sional space of indications alternating for each class [17-2, 17-3]. Two parameters were 
taken as the solution task complexity indexes, namely the number of the distribution modes 
and the variance of each separate mode. The indications of the modes belonging to the 
specific class in the case of a self-learning neural network performance mode were absent. 

The selection of the typical neural network input signal class is an important task for 
the researcher who desires to demonstrate, more or less objectively, the advantages of his 
neural network algorithm elaborated for the solution of the specific formulated task. 


17.8 
Elaboration of the Plan of Experiments 


All the undefined parameters of the neural network and the input and output test signals 
must be taken into account during the elaboration of the plan of experiments. These 
parameters must be ordered and represented in the form of the experimental plan with 
the elaborated neural network solution algorithm. 
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The problem of the small-size sample of the input signal presented to the neural 
network for its training must be considered separately. This problem is a significant 
problem in the large number of information processes aimed at decision making. This 
problem was also significant in the process of standard statistical decision making. 
However, in this case, the strict limitations related to the small-size sample made it 
necessary to take into account the a priori information concerning the distribution 
function of the processed signals. The neural network methods of the information 
processing were elaborated namely because this a priori information is absent and 
therefore the distribution functions can be very different, complex or unknown. 

The problem of the small-size sample is very significant in the following two cases: 


1. When the number of measurements is small and cannot be increased in principle; 
2. When the number of measurements is increasing but the decision must be made as 
far as the measurement results are obtained. 


We present below the part of the technique for planning of experiments. This part 
deals with the specific multilayer neural network performance under the conditions of 
the small-size learning sample for the adjustment of the multilayer neural network 
with fixed structure: 


1. The procedure of multiple repetition of the sample with a relatively small size on 
the input of the neural network is an effective technique to increase the adequacy 
of decision making by the multilayer neural network. The adequacy of decision means 
the estimation of the correct recognition probability, the mean-square error of the 
function approximation, or any other evaluation depending on the task that the 
multilayer neural network solves; 

2. One of the possible ways the neural network is implemented for the property of 
generalization by the similarity is the artificial generation of additional samples on 
the neural network input. Moreover, the additional samples must possess the math- 
ematical expectation in the form of initial small-size sample components, and they 
must have a different variance. With that, the value of the variance may change in 
the process of execution in the plan of experiments; 

3. The initial conditions’ selection at the multilayer neural network adjustment is a 
very important problem that actively influences the speed of computations in the 
neural network logical basis (the speed of the adjustment algorithms’ convergence 
with one of the local extrema or with the global extremum of the optimization 
functional) and the quality of the task solution. Under the different methods of the 
initial conditions’ selection, each time different adjustment results will be obtained. 
Averaging across the results will give thereafter, at the limited specified sample, 
additional information about the neural network performance quality. In this case, 
the initial conditions can be taken either randomly (by the initial weight coefficient 
generation through the random-value generator) or by calculation across the lim- 
ited sample with the help of the adjustment algorithms with variable structure; 

4. The division of the initial sample into the smaller samples, the multilayer neural 
network adjustment with the use of technique mentioned in points 2 and 3, and the 
results’ averaging across the set of the mentioned smaller parts of the initial sample 
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can be regarded as an additional method to analyze the neural network generaliza- 
tion properties; 

5. The resultant stage of the suggested technique is the stage of the averaging of the 
adjustment results across the set of variants mentioned in points 2, 3 and 4. The 
obtained distribution function for the quality index will have some mathematical 
expectation and variance. The mathematical expectation of the quality index is the 
main characteristic of the multilayer neural network operation with the sample of 
the ultimate object. The value of the quality index variance is the evaluation of its 
uncertainty. At the high value of the obtained variance, one must undertake some 
efforts that would result in the improvement of the solution quality. One of such 
possible efforts consists in the increase of the structure complexity of the multilayer 
neural network. 


One can use methods mentioned in points 2 and 4 when operating with the small- 
size sample in the case of the multilayer neural network with flexible structure. 

The presented technique is an illustration of the possibility to partially compensate 
the shortage of the experimental information by the additional computational resource. 
This technique can be used not only for the recognition pattern or function approxi- 
mation task solutions under the relatively small number of the experimental observa- 
tions but also for the general task solutions. 


17.9 
About the Importance of the Unification of Designations in the 
Process of Synthesis of the Neural Network Adjustment Algorithms 


The investigations in this field show that the understanding of the essence of the stud- 
ies and of the peculiarities in the different algorithmic approaches will be much more 
transparent in the case of some unification of designations in the scientific literature. 
We present below some version of such a unification. 


Designations 


- the neural network input signal; 

- the dimensionality of the feature space; 

the neural network weight coefficient; 

— the number of the feature (i=0,...,.N); 

- the analogous output signal of the neuron (the input signal of the unit imple- 
menting the activation function after the multi-input summation unit); 

- the output signal of the neuron or the neural network; 

- the number of classes; 

the number of solutions; 

the activation function; 

- the number of neurons in the first layer; 

- the number of layers in the multilayer neural network; 

(w =1,..., W) - the number of neurons in the m-th layer of the multilayer neu- 

ral network. 
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17.10 
About Myths in Neural Network Theory 


Fuzzy logic is one of the bases of neural network theory development. The neural 
network is one of the most efficient methods for the implementation of the fuzzy 
logic concept. 

Along with that, different scientific fields emerge in neural network theory develop- 
ment. But the detailed analysis shows that these fields are only some narrow and par- 
ticular case of the separate aspects of neural network theory. Therefore, we propose 
discussing some of the definitions, such as 


= Genetic algorithms; 

= Support vector machines; 

= Wavelet networks; 

= RBF-networks; 

* Principle component analysis; 
* Evolutionary programming. 


The attempts to pull out some parts of the neural network theory and to make them 
independent only weaken these parts. The aforementioned list shows the examples of 
such neural network theory divisions. 

The same particular interpretations of the neural network theory are the classic 
methods of the mathematical statistics and methods of the potential functions that 
were actively discussed at the end of the 1960s and the beginning of the 1970s. 

The main idea consists in the proposition to transform the earlier suggestions con- 
cerning the “emotional” definition of the algorithm and the similar suggestions that will 
be made in the future for some vector of quantitative parameters with the corresponding 
quantitative explanation of why the new algorithm changes this or that parameter. 

Such a qualitative description can be made for the various known neural network types: 


= Kohonen networks; 

= Elman networks; 

= Hopfield networks; 

= ART neural network, etc. 


In this case, the quantitative limitation of the different neural networks will be 
immediately seen for their users. 


17.11 
Conclusion 


In this section, we represent the most optimal, from our point of view, design cycle for 
the neural network solution algorithms that can be implemented at present. 

The number of investigations in the domain of the neural network theory is in- 
creasing. This strengthens the requirements for the comparison and the detailed clas- 
sification of the different neural network synthesis algorithms. It must be done by means 
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of the comparison of the a priori information on the neural network synthesis in each 
particular case. 


1. A priori characteristics of the neural network “supervisor instruction” space, i.e., 
the number of the pattern classes (two, K, continuum); 

2. A priori characteristics of the neural network input signal nonstationarity; 

3. The two-argument function of the “supervisor qualification” of the neural network. 
The arguments are the indexes of the corresponding classes; 

4. The function of the “supervisor’s own opinion” about its abilities. This is also the 
two-argument function with the arguments that are the indexes of the correspond- 
ing classes; 

5. A priori probabilities of the classes’ emergence; 

6. A priori characteristics of the neural network solution space (two, Ky continuum of 
solutions); 

7. The class of criteria for the primary neural network optimization; 

8. The function of losses that emerges when one pattern system is erroneously taken 
as the pattern system belonging to another class; 

9. A priori information about the conditional distribution function f'(x/e); 

10.A priori information about the fixed structure of the open neural network during 
the development of the neural network with fixed structure that is tuned through 
the closed-cycle procedure; 

11.A priori information about the class of structures during the development of the 
neural network with flexible structure; 

12.A priori information about the difference between the primary and secondary op- 
timization functionals during the development of the neural network with fixed 
structure that is tuned through the closed-cycle procedure; 

13. A priori information about the method of searching of the secondary optimization 
functional extremum; 

14.A priori information about the limitations on the adjustment coefficients; 

15.A priori information about the procedure of selection of the elements of the para- 
metric matrix K’ of the search system for the secondary optimization functional 
extremum; 

16.A priori information about the search oscillation parameters in the case of when 
the neural network adaptation algorithm cannot be designed in the analytical form; 

17.A priori information about the initial conditions for the adjustment procedure; 

18.A priori information about the class of the typical neural network input signals; 

19.A priori information about the degree of complication of the open neural network 
structure on each iteration step and about the form of this complication. 


The objective comparison between the multilayer neural networks of different types 
must be performed through the comparison of the a priori information about their 
design and the comparison of their performance quality with the typical and real in- 
put signals. 

Table 17.1 shows the comparison of the neural network synthesis procedures de- 
scribed in [17-2, 17-3] and in the large number of American studies concerning the 
error back-propagation methods. 
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Figure 17.2 represents the structure of the neural network synthesis for the solu- 
tion algorithms. This structure was developed for several years. On the one hand, 
it includes the results of the large number of investigations that are represented by 
their authors as “a new method...” or “original approach...”. However, these ap- 
proaches are rather particular. On the other hand, this structure is incomplete and 
represents only the additional line in the neural network theory that might not be 
known to the author. 

However, the author does not consider each new study that comes in his sight, from 
the point of view reflected in this figure. And he asks himself if this new study might 
be the particular case of some already known works. This was the reason for writing 
the aforementioned section “About Myths in the Neural Network Theory”. At any rate, 
the author conceives all the works that further develop the structure shown in Fig. 17.2 
with great satisfaction. 

When considering the neural network solution algorithms, the problem of a so- 
called “false” statement of the mathematical problem often arises. Let us explain it in 
some examples. The solution of the systems of the linear algebraic equations is some- 
times considered as the problem of the matrix inversion in the classical mathematic 
approach. However, the neural network algorithms for the solution of these two prob- 
lems are very different. And the problem of the matrix inversion becomes not neces- 
sary, i.e., this problem becomes “false”. A similar situation is observed in the problem 
of solution of the systems of the ordinary nonlinear differential equations. These equa- 
tions are the formalized description of the behavior of physical objects. The use of the 
neurocomputers eliminates both the necessity of the formalized description of the 
physical objects as well as the necessity of the solution of the systems of the ordinary 
nonlinear differential equations. In this case, the formalized description of the physical 
objects is performed by the use of the neural networks of different structures. Namely 
due to this reason, the neural network control decreases significantly the interest in the 
solution of the systems of the ordinary nonlinear differential equations. 

Neuromathematics establishes some new problems for computational mathematics 
that either were not solved before or were solved insufficiently. Some of these problems 
are the following: 


= The initial condition selection; 

" The universalization of the different problem solution algorithms; 

" The selection of the classes of typical input signals for the test (verification) of the 
neural network performance quality; 

" The selection and rejection of the “false” problem statements; 

" The investigation and dynamic control of the solution procedure; 

= The problem of the number of solutions related to the multi-extremum properties 
of the optimization functional. 


The development of the neural network solution algorithms allows for the efficient 
selection of the initial conditions as well as for the use of several methods for the 
dynamic control of the problem solving procedure including the control of the rate 
of convergence: 
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= The control of parameters (the modification selection) of the iteration procedure 
for the multi-extremum optimization functional search in the problem solving pro- 
cess; 

" The filtering and extrapolation of the signal corresponding to the estimation of the 
optimization functional gradient in the problem solving process; 

" The adjustment of the multilayer neural network’s activation function slope; 

" The selection and analysis of the special structures of the neural networks adequate 
to the class of the solved problems (cellular-like or continuum neural networks, neural 
networks with lateral, random and feedback coupling, neural networks with flexible 
structure). 


The problem of the number of solutions in the algorithms development is very 
important. In the case of the neural network algorithms, this problem is related to the 
multi-extremum characteristics of the optimization functional and to the possibility 
of the formation of the multi-extremum functional through the variation of the open 
neural network structure. 

The neural network theory at the present time represents an independent field of 
science. The main prospective lines of the neural network theory relate to the solution 
of the complex practical problems. Some of them are the following: 


" The continuum neural networks with the formal consideration of the con- 
tinuum number of the input signals, of the input channels or of the neurons in the 
layers; 

" The investigation of the structural or parametric reliability of the neural networks 
used in the neurocomputer implementation technology; 

" The parallelizing of the neural network algorithms for the different types of the 
commutation kernels in the super-neurocomputers; 

" The neural networks that provide the invariance to the group of transforma- 
tions (for example, the scale-invariance or signal-invariance, Lorentz invari- 
ance, etc.); 

" The analytical description of the neural networks with continuum adaptation using 
the apparatus of the linear sequential Gill machines, etc. 


Unfortunately, perhaps due to poor awareness, a large number of “home-bred” neural 
network algorithms emerge. The achievement of the first positive results on the basis 
of these algorithms can provoke an illusion about the “completeness” of the neural 
network theory. However, this theory is only in the initial phase of its development. 
Evidently, in the present work, the entire list of neural network theory problems is not 
all enumerated. The gradual progress in this domain must improve the solution of the 
vast number of existent problems as well as pose new problems. We consider that the 
neural networks will be the foremost essential tool for the investigation of complex 
problems of the modern world. 

The present study was performed in the framework of the state agreement 
with the Federal Agency for Science and Innovations in the development works 
No. 02.435.11.1003. 
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Conclusion 


Neural network theory is an independent branch of science at present. The main per- 
spective lines of the neural network theory relate to the solution of complex practical 
problems. The following fundamental problems can be mentioned: 


= Continuum neural networks with a formally considered continuum of input chan- 
nels, neurons in the layers, etc.; 

= Neural network reliability; 

= Neural networks providing the invariance to the group of transformations (for 
example, to the shift, rotation, patterns or signal scaling); 

= Analytical description of neural networks with adaptation circuits using technique 
of Gill linear sequential machines, etc. 


The number of scientific investigations in the field of neural network theory is in- 
creasing. That is why the analytical approach is required for detailed classification of 
different methods for the neural network synthesis problem solutions. And the most 
important domain for the application of such approaches is the selection of the a priori 
information required for the multilayer neural network synthesis in each particular case. 


1. A priori characteristics of the neural network teacher instruction space - the num- 
ber of pattern classes (2, K, continuum); 
2. A priori nonstationary characteristics of the neural network input signal; 
3. Neural network teacher qualification function of two arguments representing the 
indexes of corresponding classes; 
4. Function of “teacher’s slant about his capabilities” of the neural network. This is 
also a function of two arguments representing the indexes of corresponding classes; 
. A priori probabilities of classes’ appearance; 
. A priori characteristics of the neural network solution space (2, K, continuum); 
. Class of the neural network primary optimization criteria; 
. Function of losses that take place when the system considers a pattern to belong to 
a wrong class; 
9. A priori information about conditional distribution functions f’(x/e); 
10.A priori information about the fixed structure of the open-loop neural network in 
the design of the neural network with fixed structure adjustable in the closed cycle; 
11.A priori information about the structure class in the design of the neural network 
with flexible structure; 
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12.A priori information about the distinguishing features of the primary and second- 
ary functionals in the design of the neural network with fixed structure adjustable 
in the closed cycle; 

13.A priori information about a search method for the secondary optimization func- 
tional; 

14. A priori information about the presence and form of the constraints imposed upon 
the adjustable coefficients; 

15. A priori information about a selection method for the coefficients of the parametric 
matrix K" of the secondary optimization functional extremum search system; 

16.A priori information about the search oscillation parameters in the case of when 
the neural network adaptation algorithm cannot be designed in the form of the 
analytical system; 

17.A priori information about the initial conditions for the adjustment procedure; 

18.A priori information about the class of the neural network’s typical input signals; 

19.A priori information about the degree of the open-loop neural network structure 
complication at each step and about a method of such a complication realization; 


The objective comparison between the multilayer neural networks of different types 
must be performed on the basis of comparison of the available a priori information for 
their design and their operation quality at the typical and real input signals. 
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